- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

# What is scipy.cluster.vq.kmeans()method?

The **scipy.cluster.vq.kmeans(obs, k_or_guess, iter=20, thresh=1e- 05, check_finite=True)**method forms k clusters by performing a k-means algorithm on a set of observation vectors. To determine the stability of the centroids,
this method uses a threshold value to compare the change in average Euclidean distance between the observations and their corresponding centroids. The output of this method is a code book mapping centroid to codes and vice versa.

Below is given the detailed explanation of its parameters−

## Parameters

**obs**− ndarrayIt is an ‘M’ by ‘N’ array where each row is an observation, and the columns are the features seen during each observation. Before using, these features must be whitened by using the whiten() function.

**k_or_guess**− int or ndarrayIt is the number of centroids to be generated. Once generated, each centroid is given a code. This code is also the row index of the centroid in the code_book matrix. Initially, the k centroids will be selected randomly from the observation matrix.

**iter**− int, optionalThis parameter represents the number of times to run k-means so that it returns the codebook with lowest distortion. If you have already specified initial centroids with k_or_guess parameter, this parameter should be ignored.

**thresh**− float, optionalThis parameter represents the threshold value. If the change in distortion since the last iteration is less than or equal to this threshold value, the algorithm will be terminated by default.

**check_finite**− bool, optionalThis parameter is used to check whether the input matrices contain only finite numbers. Disabling this parameter may give you a performance gain but it may also result in some problems like crashes or non-termination if the observations do contain infinites. The default value of this parameter is

**True**.

## Returns

**code**− ndarrayIt returns a k by N array of k centroids where the jth centroid codebook is represented with the code j. This codebook gives the lowest distortion seen.

**distortion**− floatThis is the mean Euclidean distance between the observation vector passed and the centroids generated.

- Related Articles
- What is scipy cluster hierarchy? How to cut hierarchical clustering into flat clustering?
- What is Cluster Computing?
- What Is Cluster Analysis?
- What is Semi-Supervised Cluster Analysis?
- What is the difference between SciPy and NumPy?
- Cluster Headache
- What are the elements of the cluster?
- What is SciPy and why should we use it?
- What are the types of Constraint-Based Cluster Analysis?
- What is the use of scipy.interpolate.interp1d class of SciPy python library?
- What are the differences between Cloud Computing and Cluster Computing?
- SciPy is built upon which core packages?
- In the soap micelles, (a) the ionic end of soap is on the surface of the cluster while the carbon chain is in the interior of the cluster(b) ionic end of soap is in the interior of the cluster and the carbon chain is out of the cluster(c) Both ionic end and carbon chain are in the interior of the cluster(d) Both ionic end and carbon chain are on the exterior of the cluster.
- Checking the Cluster Health in Cassandra
- Difference between Cluster Headache and Migraine
- Which SciPy package is used to implement Clustering?