Home
Basics
Python Ecosystem
Methods for Machine Learning
Data Loading for ML Projects
Understanding Data with Statistics
Understanding Data with Visualization
Preparing Data
Data Feature Selection
ML Algorithms - Classification
Introduction
Logistic Regression
Support Vector Machine (SVM)
Decision Tree
Naïve Bayes
Random Forest
ML Algorithms - Regression
Random Forest
Linear Regression
ML Algorithms - Clustering
Overview
K-means Algorithm
Mean Shift Algorithm
Hierarchical Clustering
ML Algorithms - KNN Algorithm
Finding Nearest Neighbors
Performance Metrics
Automatic Workflows
Improving Performance of ML Models
Improving Performance of ML Model (Contd…)
ML With Python - Resources
Machine Learning With Python - Quick Guide
Machine Learning with Python - Resources
Machine Learning With Python - Discussion

ML - Analysis of Silhouette Score

Quiz

The range of Silhouette score is [-1, 1]. Its analysis is as follows −

+1 Score − Near +1 Silhouette score indicates that the sample is far away from its neighboring cluster.
0 Score − 0 Silhouette score indicates that the sample is on or very close to the decision boundary separating two neighboring clusters.
-1 Score − 1 Silhouette score indicates that the samples have been assigned to the wrong clusters.

The calculation of Silhouette score can be done by using the following formula

$$silhouette score\:=\:(p-q)/max(p,q)$$

Here, p = mean distance to the points in the nearest cluster

And, q = mean intra-cluster distance to all the points.

Davis-Bouldin Index

DB index is another good metric to perform the analysis of clustering algorithms. With the help of DB index, we can understand the following points about clustering model −

Weather the clusters are well-spaced from each other or not?
How much dense the clusters are?

We can calculate DB index with the help of following formula −

$$DB\:=\:\frac{1}{n}\displaystyle\sum\limits_{i=1}^n max_{j\neq\:i}(\frac{\sigma_{i}+\sigma_{j}}{d(c_{i},c_{j})})$$

Here, n = number of clusters

$\sigma_{i}$ = average distance of all points in cluster from the cluster centroid $c_{i}$.

Less the DB index, better the clustering model is.

Dunn Index

It works same as DB index but there are following points in which both differs −

The Dunn index considers only the worst case i.e. the clusters that are close together while DB index considers dispersion and separation of all the clusters in clustering model.
Dunn index increases as the performance increases while DB index gets better when clusters are well-spaced and dense.

We can calculate Dunn index with the help of following formula −

$$D\:=\:\frac{min_{1\leq\:i\leq\:j\leq\:n}p(i,j)}{max_{1\leq\:i\leq\:k\leq\:n}q(k)}$$

Here i,j,k = each indices for clusters

p = inter-cluster distance

q = intra-cluster distance

machine_learning_with_python_clustering_algorithms_overview.htm

Print Page