- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What is scipy cluster hierarchy? How to cut hierarchical clustering into flat clustering?
The scipy.cluster.hierarchy module provides functions for hierarchical clustering and its types such as agglomerative clustering. It has various routines which we can use to −
Cut hierarchical clustering into the flat clustering.
Implement agglomerative clustering.
Compute statistics on hierarchies
Visualize flat clustering.
To check isomorphism of two flat cluster assignments.
Plot the clusters.
The routine scipy.cluster.hierarchy.fcluster is used to cut hierarchical clustering into flat clustering, which they obtain as a result an assignment of the original data point to single clusters. Let’s understand the concept with the help of below given example −
Example
#Importing the packages from scipy.cluster.hierarchy import ward, fcluster from scipy.spatial.distance import pdist #The cluster linkage method i.e., scipy.cluster.hierarchy.ward will generate a linkage matrix as their output: A = [ [0, 0], [0, 1], [1, 0], [0, 3], [0, 2], [1, 4], [3, 0], [2, 0], [4, 1], [3, 3], [2, 3], [4, 3] ] X = ward(pdist(A)) print(X)
Output
[[ 0. 1. 1. 2. ] [ 2. 7. 1. 2. ] [ 3. 4. 1. 2. ] [ 9. 10. 1. 2. ] [ 6. 8. 1.41421356 2. ] [11. 15. 1.73205081 3. ] [ 5. 14. 2.081666 3. ] [12. 13. 2.23606798 4. ] [16. 17. 3.94968353 5. ] [18. 19. 5.15012714 7. ] [20. 21. 6.4968857 12. ]]
The matrix X as received in the above output represents a dendrogram. In this dendrogram the first and second elements are the two clusters which merged at each step. The distance between these clusters is given by the third element of above dendrogram. The size of the new cluster is provided by the fourth element.
#Flatting the dendrogram by using fcluster() where the assignation of the original data points to single clusters mostly depend on the distance threshold t. fcluster(X, t=1.5, criterion='distance') #when t= 1.5
Output
array([6, 6, 7, 4, 4, 5, 1, 7, 1, 2, 2, 3], dtype=int32)
Example
fcluster(X, t=0.9, criterion='distance') #when t= 0.9
Output
array([ 9, 10, 11, 6, 7, 8, 1, 12, 2, 3, 4, 5], dtype=int32)
Example
fcluster(X, t=9, criterion='distance') #when t= 9
Output
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)
- Related Articles
- What is Agglomerative Hierarchical Clustering?
- What are the elements in Hierarchical clustering?
- Which SciPy package is used to implement Clustering?
- What is Clustering?
- What is Conceptual Clustering?
- What is Multirelational clustering?
- Implementing K-means clustering of Diabetes dataset with SciPy library
- What is K-means clustering?
- What is Prototype-Based Clustering?
- What is model-based clustering?
- What is Multi-relational Clustering?
- What is Document Clustering Analysis?
- What is clustering Index in DBMS?
- What is an Agglomerative Clustering Algorithm?
- What is STING grid-based clustering?
