What are the characteristics of clustering algorithms?

Data MiningDatabaseData Structure

There are various characteristics of clustering algorithms which are as follows −

Order Dependence − For several algorithms, the feature and number of clusters produced can vary, perhaps dramatically, based on the order in which the data is processed. While it can seem desirable to prevent such algorithms, sometimes the order dependence is associatively minor or the algorithm can have several desirable features.

Non-determinism − Clustering algorithms, including K-means, are not order-dependent, but they make several results for each run because they based on an initialization step that needed a random choice. Because the feature of the clusters can vary from one run to another, several runs can be essential.

Scalability − It is not unusual for a data set to include thousands of objects, and the clustering algorithms used for such data sets must have linear or near-linear time and space complexity.

Even algorithms that have a complexity of $\mathrm{O(m^2)}$ are not for high information sets. Moreover, clustering techniques for data sets cannot consider that all the data will fit in main memory or that data elements can be randomly created. Such algorithms are infeasible for high information sets.

Parameter Selection − Some clustering algorithms have one or more parameters that required to be group by the user. It can be complex to select the proper values thus, the attitude is generally, "the fewer parameters, the superior." Selecting parameter values becomes even more complex if a small change in the parameters changes the clustering outcomes.

Finally, unless a process (which can contain user input) is supported for deciding parameter values, a user of the algorithm is decreased to using trial and error to find relevant parameter values.

Transforming the clustering issues to another domain − One method taken by some clustering techniques is to map the clustering issues to a problem in a multiple domain. Graph-based clustering maps the services of discovering clusters to the task of partitioning a proximity graph into connected elements.

Treating Clustering as an Optimization Problem − Clustering is considered as an optimization issues: divide the points into clusters in a method that maximizes the generosity of the resulting set of clusters as computed by a user-defined objective function.

For instance, the K-means clustering algorithm tries to discover the set of clusters that minimizes the total of the squared distance of each point from its nearest cluster centroid. There are such issues can be solved by enumerating some possible sets of clusters and choosing the one with the superior value of the objective function, but this exhaustive method is computationally unreasonable.

raja
Updated on 14-Feb-2022 12:16:41

Advertisements