What is a Chameleon?

Data MiningDatabaseData Structure

Chameleon is a hierarchical clustering algorithm that uses dynamic modeling to decide the similarity among pairs of clusters. It was changed based on the observed weaknesses of two hierarchical clustering algorithms such as ROCK and CURE.

ROCK and related designs emphasize cluster interconnectivity while neglecting data regarding cluster proximity. CURE and related design consider cluster proximity yet neglect cluster interconnectivity. In Chameleon, cluster similarity is assessed depending on how well-connected objects are inside a cluster and on the proximity of clusters. Especially, two clusters are combined if their interconnectivity is high and they are close together.

It does not base on a static, user-supplied model and can automatically adapt to the internal features of the clusters being combined. The merge process supports the discovery of natural and homogeneous clusters and is used for all types of data considering a similarity function can be defined.

Chameleon needs the k-nearest-neighbor graph technique to make a sparse graph, where each vertex of the graph defines a data object, and there exists an edge among two vertices (objects) if one object is between the k-most-similar objects of the other. The edges are weighted to reflect the similarity among objects.

Chameleon uses a graph partitioning algorithm to partition the k-nearest-neighbor graph into a large number of relatively small subclusters. It can use an agglomerative hierarchical clustering algorithm that repeatedly merges subclusters based on their similarity. It can determine the pairs of most similar subclusters, it takes into account both the interconnectivity as well as the closeness of the clusters.

The k-nearest-neighbor graph captures the approach of neighborhood dynamically: the neighborhood radius of an object is decided by the density of the region in which the object resides. In a dense area, the neighborhood is represented narrowly. In a sparse region, it is represented more widely.

This influence results in more natural clusters, in comparison with density-based methods like DBSCAN that instead use a worldwide neighborhood. Furthermore, the density of the region is recorded as the weight of the edges. Especially, the edges of a dense region tend to weigh more than that of a sparse region.

The graph-partitioning algorithm partitions the k-nearest-neighbor graph such that it makes smaller the edge cut. That is, cluster C is subdivided into sub-clusters Ciand Cj to minimize the weight of the edges that can be cut should C be bisected into Cand Cj . Edge cut is indicated EC (Ci, Cj )and determines the absolute interconnectivity between cluster Cand Cj .

Updated on 24-Nov-2021 07:01:08