What is an Agglomerative Clustering Algorithm?

Agglomerative clustering is a bottom-up clustering method where clusters have subclusters, which in turn have sub-clusters, etc. It can start by placing each object in its cluster and then mix these atomic clusters into higher and higher clusters until all the objects are in an individual cluster or until it needs definite termination condition. Some hierarchical clustering methods used to this type. The distinct only in their description of between-cluster similarity.

For example, a method called AGNES (Agglomerative Nesting), need the single-link techniques and operates as follows. Consider there are group of objects placed in a rectangle. Initially, every object is located into a cluster of its own. Therefore the clusters are merged step-by-step as per some principle such as combining the clusters with the minimum Euclidean distance between the nearest objects in the cluster.

The K-means method to clustering begins out with a constant number of clusters and allocates all data into exactly that multiple clusters. Another class of approach operates by agglomeration. These approach start out with every data point forming its own cluster and gradually combine them into higher and higher clusters until all points have been gathered into one large cluster.

The first process is to produce a similarity matrix. The similarity matrix is a table of some pair-wise distances or degrees of similarity among clusters. Originally, the similarity matrix includes the pair-wise distance among single pairs of records.

There are several measures of similarity among records, such as the Euclidean distance, the angle among vectors, and the ratio of connecting to non-connecting categorical fields.

It can be seem that with N original clusters for N data points, N2 measurement computations are needed to make the distance table. If the similarity measure is a true distance metric, only half that is required because some true distance metrics follow the method that Distance(X, Y) = Distance(Y, X).

In the mathematics, the same matrix is lower triangular. The next process is to discover the smallest value in the same matrix. This recognizes the two clusters that are most same to one another. It can combine these two clusters into a new one and refresh the similarity matrix by restoring the two rows that described the parent cluster with a new row that defines the distance among the merged cluster and the remaining clusters.

There are now N – 1 clusters and N – 1 rows in the same matrix. It can iterate the merge step N – 1 times, so some data belong to the equal large cluster. Each iteration recognize which clusters were combined and the distance among them. This information can determine which method of clustering to make use of.