What are the elements of the cluster?

Data Mining Database Data Structure

The process of combining a set of physical or abstract objects into classes of the same objects is known as clustering. A cluster is a set of data objects that are the same as one another within the same cluster and are disparate from the objects in other clusters. A cluster of data objects can be considered collectively as one group in several applications. Cluster analysis is an essential human activity.

Cluster analysis is used to form groups or clusters of the same records depending on various measures made on these records. The key design is to define the clusters in ways that can be useful for the objective of the analysis. This data has been used in several areas, such as astronomy, archaeology, medicine, chemistry, education, psychology, linguistics, and sociology.

There are various elements of the cluster which are as follows −

Data Distribution − Some clustering techniques consider a specific type of distribution for the data. Moreover, they can consider that data can be modeled as arising from a combination of distributions, where each cluster correlates to a distribution.

Shape − Some clusters are systematically shaped, such as rectangular or globular, but as usual, clusters can be of arbitrary shape. Techniques including DBSCAN and single link can manage clusters of arbitrary shape, but prototype-based schemes and some hierarchical techniques, including complete link and group average, cannot.

Differing Sizes − Several clustering methods, including K-means, don't operate well when clusters have multiple sizes.

Differing Densities − Clusters that have widely varying densities can generate issues for methods including DBSCAN and K-means.

Poorly Separated Clusters − When clusters touch or overlap, several clustering approaches combine clusters that must be kept independent. Even techniques that discover distinct clusters arbitrarily create points to one cluster or another.

Relationships among Clusters − In most clustering techniques, there is no explicit consideration of the associations among clusters, including their relative position. Self-organizing maps are a clustering method that directly treated the relationships between clusters during the clustering phase. Moreover, the assignment of a point to one cluster influence the definitions of nearby clusters.

Subspace Clusters − Clusters can only exist in a subset of dimensions (attributes), and the clusters decided using one set of dimensions can be different from the clusters decided by using another set.

While this problem can increase with as few as two dimensions, it becomes more acute as dimensionality improves, because the several possible subsets of dimensions are exponential in the total number of dimensions. Because it is not applicable to simply view for clusters in all possible subsets of dimensions unless the multiple dimensions are relatively low.

Ginni

Updated on: 14-Feb-2022

446 Views

Kickstart Your Career

Get certified by completing the course

Get Started