What are the types of clusters in data mining?

Cluster analysis is used to form groups or clusters of the same records depending on various measures made on these records. It can define the clusters in ways that can be beneficial for the objective of the analysis. This data has been used in several areas, such as astronomy, archaeology, medicine, chemistry, education, psychology, linguistics, and sociology.

There are various types of clusters which are as follows −

Well-Separated − A cluster is a group of objects in which every element is nearer to every other element in the cluster than to some object not in the cluster. Sometimes a threshold can define that all the objects in a cluster should be adequately close (or similar) to one another. This description of a cluster is needed only when the data includes natural clusters that are completely far from each other.

Prototype-Based − A cluster is a group of objects in which each object is nearer to the prototype that represents the cluster than to the prototype of some multiple clusters. For a data with continuous attributes, the prototype of a cluster is a centroid such as the average (mean) of various points in the cluster. When a centroid is unimportant, including when the record has categorical attributes, the prototype is a medoid such as the general point of a cluster.

Graph-Based − If the data is typical as a graph, where the nodes are objects and the links define connections between objects then a cluster can be represented as a connected element; i.e., a set of objects that are linked to one another, but that has no connection to objects farther the group.

Important instances of graph-based clusters are contiguity-based clusters, where two objects are linked only if they are inside a specified distance of each other. This indicates that each object in a contiguity-based cluster is closer to multiple objects in the cluster than to some point in multiple cluster.

Density-based Methods − Some partitioning techniques cluster objects depending on the distance between objects. Such approaches can discover only spherical-shaped clusters and encounter difficulty in discovering clusters of arbitrary shapes. There are multiple clustering methods have been generated depending on the concept of density.

DBSCAN is a frequent density-based method that increases clusters as per a density threshold. OPTICS is a density-based method that computes an expanded clustering ordering for automatic and mutual cluster analysis.

Grid-based Methods − Grid-based methods quantize the object area into finite multiple cells which form a grid structure. Several clustering services are implemented on the grid structure (i.e., on the quantized space).

The advantage of this approach is its fast processing time which is frequently independent of the multiple data objects and based only on the multiple cells in each dimension in the quantized space.

Updated on: 14-Feb-2022


Kickstart Your Career

Get certified by completing the course

Get Started