A hierarchical clustering technique works by combining data objects into a tree of clusters. Hierarchical clustering algorithms are either top-down or bottom-up. The quality of an authentic hierarchical clustering method deteriorates from its inability to implement adjustment once a merge or split decision is completed.
The merging of clusters is based on the distance among clusters. The broadly used measures for the distance between clusters are as follows, where mi is the mean for cluster Ci, ni is the number of points in Ci, and |p – p’| is the distance among two points p and p'.
There are two types of hierarchical clustering methods which are as follows −
Agglomerative Hierarchical Clustering (AHC) − AHC is a bottom-up clustering method where clusters have sub-clusters, which in turn have sub-clusters, etc. It begins by locating each object in its cluster and then combines these atomic clusters into larger and larger clusters until all the objects are in a single cluster or until it satisfies specific termination condition. Most hierarchical clustering methods are applied to this type. They are distinct only in their definition of between-cluster similarity.
For example, a method known as AGNES (Agglomerative Nesting), uses the single-link techniques and works as follows. Consider there are set of objects located in a rectangle. Originally, each object is placed into a cluster of its own. Then the clusters are combined step-by-step according to some principle including merging the clusters with the minimum Euclidean distance among the closest objects in the cluster.
Divisive Hierarchical Clustering (DHC) − DHC is a top-down approach and is less generally used. It works in similar methods to agglomerative clustering but in the opposite direction. This method begins with a single cluster including all objects, and then successively splits resulting clusters until only clusters of single objects remain or until it satisfies specific termination condition, including a desired number of clusters is obtained or the distance between two closest clusters is above a specific threshold distance.
Divisive methods are not generally accessible and rarely have been used because of the difficulty of creating the right decision of dividing at a high level. DIANA (Divisia Analysis) is one example of the divisive hierarchical clustering method. It works in the opposite order. Originally, all the objects are located in one cluster. Thus the cluster is divided according to some principle, including splitting the clusters according to the maximum Euclidean distance among the closest neighboring objects in the cluster.