- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What is the difference between scipy.cluster.vq.kmeans() and scipy.cluster.vq.kmeans2() methods?
The scipy.cluster.vq()has two methods to implement k-means clustering namely kmeans() and kmeans2(). There is a significant difference in the working of both these methods. Let us understand it −
scipy.cluster.vq.kmeans(obs, k_or_guess, iter=20, thresh=1e-05, check_finite=True)− The kmeans() method forms k clusters by performing k-means algorithm on a set of observation vectors. To determine the stability of the centroids, this method uses a threshold value to compare the change in average Euclidean distance between the observations and their corresponding centroids. The output of this method is a code book mapping centroid to codes and vice versa.
scipy.cluster.vq.kmeans2(data, k, iter=10, thresh=1e-05, minit='random', missing='warn', check_finite=True)− The kmeans2() method classify a set of observations vectors into k clusters by performing k-means algorithm. To check for convergence, unlike kmeans() method, kmeans2() method does not use threshold value. Kmeans2() also has more parameters than kmeans() method. It has additional parameters to decide the method of initialization of centroids, to handle empty clusters, and to validate if the input matrices contain only finite numbers or not.
Example
Computing K-means with kmeans() method −
#importing the required Python libraries: import numpy as np from numpy import vstack,array from numpy.random import rand from scipy.cluster.vq import whiten, kmeans, vq #Random data generation: data = vstack((rand(200,2) + array([.5,.5]),rand(150,2))) #Normalizing the data: data = whiten(data) # computing K-Means with kmeans() method centroids, mean_value = kmeans(data, 3) print("Code book :
", centroids, "
") print("Mean of Euclidean distances :", mean_value.round(4))
Output
Code book : [[2.45420231 3.19421081] [2.77295342 1.74582367] [0.99156276 1.35546602]] Mean of Euclidean distances : 0.791
Computing K-means with kmeans2() method on same array data −
#importing the required Python libraries: import numpy as np from numpy import vstack,array from numpy.random import rand from scipy.cluster.vq import whiten, kmeans2 #Random data generation: data = vstack((rand(200,2) + array([.5,.5]),rand(150,2))) #Normalizing the data: data = whiten(data) # computing K-Means with kmeans2() method centroids, clusters = kmeans2(data, 3, minit='random') print("Code book :
", centroids, "
") print(("Clusters :", clusters))
Output
Code book : [[3.07353603 2.71692674] [1.07148876 0.74285308] [1.64579292 2.29821454]] ('Clusters :', array([2, 0, 0, 2, 0, 2, 0, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 2, 2, 0, 0, 0, 2, 2, 1, 2, 2, 2, 0, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 2, 2, 0, 0, 0, 2, 0, 0, 0, 2, 2, 2, 0, 0, 0, 2, 2, 0, 2, 0, 0, 0, 2, 2, 0, 2, 0, 0, 0, 0, 0, 0, 2, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 2, 0, 0, 0, 2, 2, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 2, 0, 0, 2, 0, 2, 0, 2, 0, 0, 2, 2, 0, 2, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 2, 0, 0, 0, 0, 0, 2, 0, 2, 0, 0, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 0, 2, 0, 0, 2, 0, 2, 2, 0, 0, 0, 0, 0, 2, 0, 0, 2, 2, 0, 0, 0, 2, 0, 2, 0, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 2, 2, 1, 2, 1, 0, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 2, 1, 1], dtype=int32))
- Related Articles
- What is scipy cluster hierarchy? How to cut hierarchical clustering into flat clustering?
- What is the difference between SciPy and NumPy?
- Difference between Cluster Headache and Migraine
- Difference between Cluster Computing and Grid Computing
- What is Cluster Computing?
- What Is Cluster Analysis?
- What are the differences between Cloud Computing and Cluster Computing?
- What is Semi-Supervised Cluster Analysis?
- Cluster Headache
- What are the elements of the cluster?
- In the soap micelles, (a) the ionic end of soap is on the surface of the cluster while the carbon chain is in the interior of the cluster(b) ionic end of soap is in the interior of the cluster and the carbon chain is out of the cluster(c) Both ionic end and carbon chain are in the interior of the cluster(d) Both ionic end and carbon chain are on the exterior of the cluster.
- Checking the Cluster Health in Cassandra
- What is SciPy and why should we use it?
- What are the types of Constraint-Based Cluster Analysis?
- What is the use of scipy.interpolate.interp1d class of SciPy python library?
- Calculating the Manhattan distance using SciPy
