What is scipy.cluster.vq.kmeans2()method?

scipy.cluster.vq.kmeans2(data, k, iter=10, thresh=1e-05, minit='random', missing='warn', check_finite=True)− The kmeans2() method classify a set of observations vectors into k clusters by performing k-means algorithm. To check for convergence, the kmeans2() method does not use threshold values. It has additional parameters to decide the method of initialization of centroids, to handle empty clusters, and to validate if the input metrices contain only finite numbers or not.

Below is given the detailed explanation of its parameters −


  • data− ndarray

    It is an ‘M’ by ‘N’ array of M observations in N dimension.

  • k− int or ndarray

    This parameter represents the number of clusters to form and the centroids to generate. It is interpreted as initial cluster to use in case of the two conditions given below −

    • When minit initialization string is ‘matrix’.

    • or if a ndarray is given.

  • thresh− float, optional

    This parameter represents the threshold value. If the change in distortion since the last iteration is less than or equal to this threshold value, the algorithm will be terminated by default.

  • minit− str, optional

    This parameter represents the method for initialization. Below are given some available methods for the same −

    • random− It generates k centroids from a Gaussian with mean and variance. The mean and variance are estimated from the data.

    • points− This method chooses k observations i.e., rows randomly from data for the initial centroids.

    • ++− This method, also called careful seeding, choose k observations i.e., rows to the kmeans++ method.

    • matrix− The matrix method interprets the k parameter (as ‘k’ by ‘M’ array) of initial centroids.

  • missing− str, optional

This parameter represents the method to deal with empty clusters. Below are the available methods −

  • warn− This method, as name implies, give a warning, and continue.

  • raise− This method will raise an error (ClusterError) and terminate the algorithm.

  • check_finite− bool, optional

    This parameter is used to check whether the input matrices contain only finite numbers. Disabling this parameter may give you a performance gain but it may also result in some problems like crashes or non-termination if the observations do contain infinites. The default value of this parameter is True.


  • centroid− ndarray

    It returns a k by N array of centroids.

  • label− ndarray

    This is the index of the centroid.