What are the approaches of Graph-based clustering?

The process of combining a set of physical or abstract objects into classes of the same objects is known as clustering. A cluster is a set of data objects that are the same as one another within the same cluster and are disparate from the objects in other clusters. A cluster of data objects can be considered collectively as one group in several applications. Cluster analysis is an essential human activity.

Clustering supports in identifying the outliers. The same values are organized into clusters and those values which fall outside the cluster are known as outliers. Clustering techniques consider data tuples as objects. They partition the objects into groups or clusters so that objects within a cluster are “similar” to one another and “dissimilar” to objects in other clusters. It is commonly defined in terms of how “close” the objects are in space, based on a distance function.

There are various approaches of graph-based clustering which are as follows −

Sparsify the proximity graph to maintain only the link of an object with its closest neighbors. This sparsification is beneficial for managing noise and outliers. It also enables the use of hugely effective graph partitioning algorithms that have been produced for sparse graphs.

It can represent a similarity measure among two objects based on the several nearest neighbors that they send. This method which depends on the observation that an object and its closest neighbors generally belong to the same class, is beneficial for overcoming issues with high dimensionality and clusters of changing density.

It can represent core objects and develop clusters around them. In graphbased clustering, it is essential to introduce a concept of density-based on a proximity graph or a sparsified proximity graph. As with DBSCAN, developing clusters around core objects leads to a clustering approaches that can discover clusters of differing shapes and sizes.

It can use the data in the proximity graph to support a more sophisticated computation of whether two clusters should be combined. Particularly two clusters are combined only if the resulting cluster will have characteristics same to the initial two clusters.

It can start by discussing the sparsification of proximity graphs, supporting two instances of techniques whose method to clustering is based on this approach such as MST which is same to the single connection clustering algorithm, and Opossum.

A hierarchical clustering algorithm that needs a concept of self-similarity to determine if clusters should be combined. It can define Shared Nearest Neighbor (SNN) similarity, a new similarity measure) and learns the Jarvis-Patrick clustering algorithm, which needs this similarity.