What are the methods of clustering?

Data MiningDatabaseData Structure

There are various methods of clustering which are as follows −

Partitioning Methods − Given a database of n objects or data tuples, a partitioning method assembles k partitions of the information, where each partition defines a cluster, and k < n. It can allocate the data into k groups, which can satisfy the following necessity −

  • Each group must include a minimum of one object.

  • Each object should apply to accurately one group.

Given k, the number of partitions to construct, a partitioning method makes an initial partitioning. It then uses an iterative relocation method which attempts to improve the partitioning by transforming objects from one group to another.

The general criterion of good partitioning is that objects in the same cluster are “close” or associated with each other, whereas objects of different clusters are “far apart” or very different. There are several types of other criteria for determining the quality of partitions.

Hierarchical Methods − A hierarchical method generates a hierarchical decomposition of the given set of data objects. A hierarchical method can be categorized as being either agglomerative or divisive, depends on how the hierarchical decomposition is produced. The agglomerative approach is also known as the “bottom-up” approach.

It can begin with each object forming an independent group. It successively combines the objects or groups close to one another, until all of the groups are combined into one (the topmost level of the hierarchy), or until a termination condition holds. The divisive approach is also referred to as the “top-down” approach. It can begin with all the objects in the same cluster. In each successive iteration, a cluster is divided up into smaller clusters, until eventually, each object is in one cluster, or until a termination condition holds.

Density-based Methods − Some partitioning methods cluster objects based on the distance among objects. Such methods can discover only spherical-shaped clusters and encounter difficulty in finding clusters of arbitrary shapes. Other clustering methods have been created based on the concept of density.

DBSCAN is a typical density-based method that increases clusters according to a density threshold. OPTICS is a density-based method that evaluates an augmented clustering ordering for automatic and interactive cluster analysis.

Grid-based Methods − Grid-based methods quantize the object space into a finite number of cells which form a grid architecture. Some clustering operations are implemented on the grid architecture (i.e., on the quantized space).

The benefit of this approach is its quick processing time which is generally independent of the number of data objects, and dependent only on the number of cells in each dimension in the quantized space. STING is an instance of a grid-based method. CLIQUE and Wave-Cluster are two clustering algorithms that are both grid-based and density-based.

Model-based Methods − Model-based methods hypothesize a model for each of the clusters and discover the best fit of the records to the given model. A model-based algorithm can locate clusters by making a density function that reflects the spatial distribution of the data points. It also leads to a method of automatically deciding the number of clusters based on standard statistics, taking “noise” or outliers into account and thus yielding robust clustering methods.

Published on 24-Nov-2021 06:34:58