What is scipy.cluster.vq.kmeans2()method?

Scipy Scientific Computing Programming

scipy.cluster.vq.kmeans2(data, k, iter=10, thresh=1e-05, minit='random', missing='warn', check_finite=True)− The kmeans2() method classify a set of observations vectors into k clusters by performing k-means algorithm. To check for convergence, the kmeans2() method does not use threshold values. It has additional parameters to decide the method of initialization of centroids, to handle empty clusters, and to validate if the input metrices contain only finite numbers or not.

Below is given the detailed explanation of its parameters −

Parameters

data− ndarray
It is an ‘M’ by ‘N’ array of M observations in N dimension.
k− int or ndarray
This parameter represents the number of clusters to form and the centroids to generate. It is interpreted as initial cluster to use in case of the two conditions given below −
- When minit initialization string is ‘matrix’.
- or if a ndarray is given.
thresh− float, optional
This parameter represents the threshold value. If the change in distortion since the last iteration is less than or equal to this threshold value, the algorithm will be terminated by default.
minit− str, optional
This parameter represents the method for initialization. Below are given some available methods for the same −
- random− It generates k centroids from a Gaussian with mean and variance. The mean and variance are estimated from the data.
- points− This method chooses k observations i.e., rows randomly from data for the initial centroids.
- ++− This method, also called careful seeding, choose k observations i.e., rows to the kmeans++ method.
- matrix− The matrix method interprets the k parameter (as ‘k’ by ‘M’ array) of initial centroids.
missing− str, optional

This parameter represents the method to deal with empty clusters. Below are the available methods −

warn− This method, as name implies, give a warning, and continue.
raise− This method will raise an error (ClusterError) and terminate the algorithm.
check_finite− bool, optional
This parameter is used to check whether the input matrices contain only finite numbers. Disabling this parameter may give you a performance gain but it may also result in some problems like crashes or non-termination if the observations do contain infinites. The default value of this parameter is True.

Returns

centroid− ndarray
It returns a k by N array of centroids.
label− ndarray
This is the index of the centroid.

Gaurav Kumar

Updated on: 24-Nov-2021

127 Views

Kickstart Your Career

Get certified by completing the course

Get Started