- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

# What is scipy.cluster.vq.kmeans()method?

The **scipy.cluster.vq.kmeans(obs, k_or_guess, iter=20, thresh=1e- 05, check_finite=True)**method forms k clusters by performing a k-means algorithm on a set of observation vectors. To determine the stability of the centroids,
this method uses a threshold value to compare the change in average Euclidean distance between the observations and their corresponding centroids. The output of this method is a code book mapping centroid to codes and vice versa.

Below is given the detailed explanation of its parameters−

## Parameters

**obs**− ndarrayIt is an ‘M’ by ‘N’ array where each row is an observation, and the columns are the features seen during each observation. Before using, these features must be whitened by using the whiten() function.

**k_or_guess**− int or ndarrayIt is the number of centroids to be generated. Once generated, each centroid is given a code. This code is also the row index of the centroid in the code_book matrix. Initially, the k centroids will be selected randomly from the observation matrix.

**iter**− int, optionalThis parameter represents the number of times to run k-means so that it returns the codebook with lowest distortion. If you have already specified initial centroids with k_or_guess parameter, this parameter should be ignored.

**thresh**− float, optionalThis parameter represents the threshold value. If the change in distortion since the last iteration is less than or equal to this threshold value, the algorithm will be terminated by default.

**check_finite**− bool, optionalThis parameter is used to check whether the input matrices contain only finite numbers. Disabling this parameter may give you a performance gain but it may also result in some problems like crashes or non-termination if the observations do contain infinites. The default value of this parameter is

**True**.

## Returns

**code**− ndarrayIt returns a k by N array of k centroids where the jth centroid codebook is represented with the code j. This codebook gives the lowest distortion seen.

**distortion**− floatThis is the mean Euclidean distance between the observation vector passed and the centroids generated.

- Related Questions & Answers
- What is scipy cluster hierarchy? How to cut hierarchical clustering into flat clustering?
- What is Cluster Computing?
- What Is Cluster Analysis?
- What is Semi-Supervised Cluster Analysis?
- What is the difference between SciPy and NumPy?
- What are the elements of the cluster?
- What is SciPy and why should we use it?
- What are the types of Constraint-Based Cluster Analysis?
- What is the use of scipy.interpolate.interp1d class of SciPy python library?
- SciPy is built upon which core packages?
- What are the differences between Cloud Computing and Cluster Computing?
- Which SciPy package is used to implement Clustering?
- What is scipy.cluster.vq.kmeans2()method?
- What is scipy.cluster.hierarchy.fcluster()method?
- What is SciPy in Python? Explain how it can be installed, and its applications?