- Machine Learning with Python
- Home
- Basics
- Python Ecosystem
- Methods for Machine Learning
- Data Loading for ML Projects
- Understanding Data with Statistics
- Understanding Data with Visualization
- Preparing Data
- Data Feature Selection
- ML Algorithms − Classification
- Introduction
- Logistic Regression
- Support Vector Machine(SVM)
- Decision Tree
- Naïve Bayes
- Random Forest
- ML Algorithms − Regression
- Overview
- Linear Regression
- ML Algorithms − Clustering
- Overview
- K-Means Algorithm
- Mean Shift Algorithm
- Hierarchical Clustering
- ML Algorithms − KNN Algorithm
- Finding Nearest Neighbors
- Performance Metrics
- Automatic Workflows
- Improving Performance of ML Models
- Improving Performance of ML Model(contd..)

- Useful Resources
- Quick Guide
- Useful Resources
- Discussion

The range of **Silhouette score** is [-1, 1]. Its analysis is as follows −

**+1 Score**− Near +1**Silhouette score**indicates that the sample is far away from its neighboring cluster.**0 Score**− 0**Silhouette score**indicates that the sample is on or very close to the decision boundary separating two neighboring clusters.**-1 Score**− 1**Silhouette score**indicates that the samples have been assigned to the wrong clusters.

The calculation of Silhouette score can be done by using the following formula

$$silhouette score\:=\:(p-q)/max(p,q)$$

Here, p = mean distance to the points in the nearest cluster

And, q = mean intra-cluster distance to all the points.

DB index is another good metric to perform the analysis of clustering algorithms. With the help of DB index, we can understand the following points about clustering model −

- Weather the clusters are well-spaced from each other or not?
- How much dense the clusters are?

We can calculate DB index with the help of following formula −

$$DB\:=\:\frac{1}{n}\displaystyle\sum\limits_{i=1}^n max_{j\neq\:i}(\frac{\sigma_{i}+\sigma_{j}}{d(c_{i},c_{j})})$$

Here, n = number of clusters

$\sigma_{i}$ = average distance of all points in cluster 𝑖 from the cluster centroid $c_{i}$.

Less the DB index, better the clustering model is.

It works same as DB index but there are following points in which both differs −

The Dunn index considers only the worst case i.e. the clusters that are close together while DB index considers dispersion and separation of all the clusters in clustering model.

Dunn index increases as the performance increases while DB index gets better when clusters are well-spaced and dense.

We can calculate Dunn index with the help of following formula −

$$D\:=\:\frac{min_{1\leq\:i\leq\:j\leq\:n}p(i,j)}{max_{1\leq\:i\leq\:k\leq\:n}q(k)}$$

Here i,j,k = each indices for clusters

p = inter-cluster distance

q = intra-cluster distance

machine_learning_with_python_clustering_algorithms_overview.htm

Advertisements