- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

The scipy.cluster.hierarchy module provides functions for hierarchical clustering and its types such as agglomerative clustering. It has various routines which we can use to −

Cut hierarchical clustering into the flat clustering.

Implement agglomerative clustering.

Compute statistics on hierarchies

Visualize flat clustering.

To check isomorphism of two flat cluster assignments.

Plot the clusters.

The routine **scipy.cluster.hierarchy.fcluster** is used to cut hierarchical clustering into flat clustering, which they obtain as a result an assignment of the original data point to single clusters. Let’s understand the concept with the help of below given example −

#Importing the packages from scipy.cluster.hierarchy import ward, fcluster from scipy.spatial.distance import pdist #The cluster linkage method i.e., scipy.cluster.hierarchy.ward will generate a linkage matrix as their output: A = [ [0, 0], [0, 1], [1, 0], [0, 3], [0, 2], [1, 4], [3, 0], [2, 0], [4, 1], [3, 3], [2, 3], [4, 3] ] X = ward(pdist(A)) print(X)

[[ 0. 1. 1. 2. ] [ 2. 7. 1. 2. ] [ 3. 4. 1. 2. ] [ 9. 10. 1. 2. ] [ 6. 8. 1.41421356 2. ] [11. 15. 1.73205081 3. ] [ 5. 14. 2.081666 3. ] [12. 13. 2.23606798 4. ] [16. 17. 3.94968353 5. ] [18. 19. 5.15012714 7. ] [20. 21. 6.4968857 12. ]]

The matrix X as received in the above output represents a dendrogram. In this dendrogram the first and second elements are the two clusters which merged at each step. The distance between these clusters is given by the third element of above dendrogram. The size of the new cluster is provided by the fourth element.

#Flatting the dendrogram by using fcluster() where the assignation of the original data points to single clusters mostly depend on the distance threshold t. fcluster(X, t=1.5, criterion='distance') #when t= 1.5

array([6, 6, 7, 4, 4, 5, 1, 7, 1, 2, 2, 3], dtype=int32)

fcluster(X, t=0.9, criterion='distance') #when t= 0.9

array([ 9, 10, 11, 6, 7, 8, 1, 12, 2, 3, 4, 5], dtype=int32)

fcluster(X, t=9, criterion='distance') #when t= 9

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

- Related Questions & Answers
- What is Agglomerative Hierarchical Clustering?
- What are the elements in Hierarchical clustering?
- Which SciPy package is used to implement Clustering?
- What is Clustering?
- What is Conceptual Clustering?
- What is Multirelational clustering?
- What is Multi-relational Clustering?
- What is K-means clustering?
- What is Prototype-Based Clustering?
- What is model-based clustering?
- What is Document Clustering Analysis?
- What is clustering Index in DBMS?
- What is an Agglomerative Clustering Algorithm?
- What is STING grid-based clustering?
- Implementing K-means clustering of Diabetes dataset with SciPy library

Advertisements