ML - Analysis of Silhouette Score



The range of Silhouette score is [-1, 1]. Its analysis is as follows −

  • +1 Score − Near +1 Silhouette score indicates that the sample is far away from its neighboring cluster.

  • 0 Score − 0 Silhouette score indicates that the sample is on or very close to the decision boundary separating two neighboring clusters.

  • -1 Score − 1 Silhouette score indicates that the samples have been assigned to the wrong clusters.

The calculation of Silhouette score can be done by using the following formula

$$silhouette score\:=\:(p-q)/max(p,q)$$

Here, p = mean distance to the points in the nearest cluster

And, q = mean intra-cluster distance to all the points.

Davis-Bouldin Index

DB index is another good metric to perform the analysis of clustering algorithms. With the help of DB index, we can understand the following points about clustering model −

  • Weather the clusters are well-spaced from each other or not?
  • How much dense the clusters are?

We can calculate DB index with the help of following formula −

$$DB\:=\:\frac{1}{n}\displaystyle\sum\limits_{i=1}^n max_{j\neq\:i}(\frac{\sigma_{i}+\sigma_{j}}{d(c_{i},c_{j})})$$

Here, n = number of clusters

$\sigma_{i}$ = average distance of all points in cluster 𝑖 from the cluster centroid $c_{i}$.

Less the DB index, better the clustering model is.

Dunn Index

It works same as DB index but there are following points in which both differs −

  • The Dunn index considers only the worst case i.e. the clusters that are close together while DB index considers dispersion and separation of all the clusters in clustering model.

  • Dunn index increases as the performance increases while DB index gets better when clusters are well-spaced and dense.

We can calculate Dunn index with the help of following formula −

$$D\:=\:\frac{min_{1\leq\:i\leq\:j\leq\:n}p(i,j)}{max_{1\leq\:i\leq\:k\leq\:n}q(k)}$$

Here i,j,k = each indices for clusters

p = inter-cluster distance

q = intra-cluster distance

machine_learning_with_python_clustering_algorithms_overview.htm
Advertisements