 
 Data Structure Data Structure
 Networking Networking
 RDBMS RDBMS
 Operating System Operating System
 Java Java
 MS Excel MS Excel
 iOS iOS
 HTML HTML
 CSS CSS
 Android Android
 Python Python
 C Programming C Programming
 C++ C++
 C# C#
 MongoDB MongoDB
 MySQL MySQL
 Javascript Javascript
 PHP PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What is the difference between scipy.cluster.vq.kmeans() and scipy.cluster.vq.kmeans2() methods?
The scipy.cluster.vq()has two methods to implement k-means clustering namely kmeans() and kmeans2(). There is a significant difference in the working of both these methods. Let us understand it −
- scipy.cluster.vq.kmeans(obs, k_or_guess, iter=20, thresh=1e-05, check_finite=True)− The kmeans() method forms k clusters by performing k-means algorithm on a set of observation vectors. To determine the stability of the centroids, this method uses a threshold value to compare the change in average Euclidean distance between the observations and their corresponding centroids. The output of this method is a code book mapping centroid to codes and vice versa. 
- scipy.cluster.vq.kmeans2(data, k, iter=10, thresh=1e-05, minit='random', missing='warn', check_finite=True)− The kmeans2() method classify a set of observations vectors into k clusters by performing k-means algorithm. To check for convergence, unlike kmeans() method, kmeans2() method does not use threshold value. Kmeans2() also has more parameters than kmeans() method. It has additional parameters to decide the method of initialization of centroids, to handle empty clusters, and to validate if the input matrices contain only finite numbers or not. 
Example
Computing K-means with kmeans() method −
#importing the required Python libraries:
import numpy as np
from numpy import vstack,array
from numpy.random import rand
from scipy.cluster.vq import whiten, kmeans, vq
#Random data generation:
data = vstack((rand(200,2) + array([.5,.5]),rand(150,2)))
#Normalizing the data:
data = whiten(data)
# computing K-Means with kmeans() method
centroids, mean_value = kmeans(data, 3)
print("Code book :
", centroids, "
")
print("Mean of Euclidean distances :", mean_value.round(4))
Output
Code book : [[2.45420231 3.19421081] [2.77295342 1.74582367] [0.99156276 1.35546602]] Mean of Euclidean distances : 0.791
Computing K-means with kmeans2() method on same array data −
#importing the required Python libraries:
import numpy as np
from numpy import vstack,array
from numpy.random import rand
from scipy.cluster.vq import whiten, kmeans2
#Random data generation:
data = vstack((rand(200,2) + array([.5,.5]),rand(150,2)))
#Normalizing the data:
data = whiten(data)
# computing K-Means with kmeans2() method
centroids, clusters = kmeans2(data, 3, minit='random')
print("Code book :
", centroids, "
")
print(("Clusters :", clusters))
Output
Code book :
[[3.07353603 2.71692674]
[1.07148876 0.74285308]
[1.64579292 2.29821454]]
('Clusters :', array([2, 0, 0, 2, 0, 2, 0, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 2, 2, 0, 0, 0,
2, 2, 1, 2, 2, 2, 0, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 0, 0, 2, 2,
0, 0, 0, 0, 0, 0, 2, 0, 0, 2, 2, 0, 0, 0, 2, 0, 0, 0, 2, 2, 2, 0,
0, 0, 2, 2, 0, 2, 0, 0, 0, 2, 2, 0, 2, 0, 0, 0, 0, 0, 0, 2, 0, 2,
2, 2, 2, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 2, 0, 0, 0, 2, 2, 0,
2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 2, 0, 0, 2, 0,
2, 0, 2, 0, 0, 2, 2, 0, 2, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0,
2, 0, 0, 0, 0, 0, 2, 0, 2, 0, 0, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 0,
2, 0, 0, 2, 0, 2, 2, 0, 0, 0, 0, 0, 2, 0, 0, 2, 2, 0, 0, 0, 2, 0,
2, 0, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1,
2, 2, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1,
1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 2, 2,
1, 2, 1, 0, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1,
2, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2,
1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 2,
2, 2, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 2, 1, 1],
dtype=int32))