- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

# How to use K-Means clustering algorithm in Python Scikit-learn?

**K-Means** clustering algorithm computes the centroids and iterates until it finds optimal centroid. It requires the number of clusters to be specified that’s why it assumes that they are already known. The main logic of this algorithm is to cluster the data separating samples in n number of groups of equal variances by minimizing the criteria known as the inertia. The number of clusters identified by algorithm is represented by ‘K.

**Scikit-learn** have sklearn.cluster.KMeans module to perform K-Means clustering algorithm in Python.

## Example

For the example below, we will create a test binary classification dataset by using the make_classification() function. This dataset would consist of 10000 samples with two input features and one cluster per class.

# Import required libraries from numpy import unique from numpy import where from sklearn.datasets import make_classification from sklearn.cluster import KMeans from matplotlib import pyplot #%matplotlib inline # Set the figure size pyplot.rcParams["figure.figsize"] = [7.16, 3.50] pyplot.rcParams["figure.autolayout"] = True # Define binary classification dataset having 10000 samples with two input features and one cluster per class. X,y = make_classification(n_samples=10000, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=4) # Create scatter plot for all samples from each class for value in range(2): # Getting row indexes for samples row = where(y == value) # Creating scatter plot of all the samples pyplot.scatter(X[row, 0], X[row, 1]) # Plot the figure pyplot.title('Classification Dataset', size ='18') pyplot.show() # Define the KMeans clustering model KMeans_model = KMeans(n_clusters=2) # Fit the model KMeans_model.fit(X) # Assigning a cluster per sample yc = KMeans_model.predict(X) # Retrieve the unique clusters from all clusters clusters_AC = unique(yc) # Create scatter plot for all samples from each cluster for cluster in clusters_AC: # Getting row indexes for all samples within this cluster row = where(yc == cluster) # creating scatter plot of all the samples pyplot.scatter(X[row, 0], X[row, 1]) # Plot the figure pyplot.title('Cluster Prediction for Each Example in Dataset', size ='18') pyplot.show()

## Output

It will produce the following output −

## Mini-Batch K-Means Clustering Algorithm

Mini-Batch K-Means clustering algorithm is a modified version of k-means. Rather than using the entire dataset, as its name suggest, it makes updates to the cluster centroids using mini batches of samples. Due to this mini-batch k-means is faster and more robust.

Scikit-learn have sklearn.cluster.MiniBatchKMeans module to perform Mini-Batch K-Means clustering algorithm in Python.

### Example

For the example below, we will create a test binary classification dataset by using the make_classification() function. This dataset would consist of 10000 samples with two input features and one cluster per class.

# Import required libraries from numpy import unique from numpy import where from sklearn.datasets import make_classification from sklearn.cluster import MiniBatchKMeans from matplotlib import pyplot # %matplotlib inline # Set the figure size pyplot.rcParams["figure.figsize"] = [7.16, 3.50] pyplot.rcParams["figure.autolayout"] = True # Define binary classification dataset having 10000 samples with two input features and one cluster per class. X,y = make_classification(n_samples=10000, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=4) # Create scatter plot for all samples from each class for value in range(2): # Getting row indexes for samples row = where(y == value) # Creating scatter plot of all the samples pyplot.scatter(X[row, 0], X[row, 1]) # Plot the figure pyplot.title('Classification Dataset', size ='18') pyplot.show() # Define the KMeans clustering model MBKMeans_model = MiniBatchKMeans(n_clusters=2) # Fit the model MBKMeans_model.fit(X) # Assigning a cluster per sample yc = MBKMeans_model.predict(X) # Retrieve the unique clusters from all clusters clusters_AC = unique(yc) # Create scatter plot for all samples from each cluster for cluster in clusters_AC: # Getting row indexes for all samples within this cluster row = where(yc == cluster) # creating scatter plot of all the samples pyplot.scatter(X[row, 0], X[row, 1]) # Plot the figure pyplot.title('Cluster Prediction for Each Example in Dataset', size ='18') pyplot.show()

### Output

It will produce the following output −

- Related Questions & Answers
- What is K-means clustering?
- How does the k-means algorithm work?
- Implementing K-means clustering of Diabetes dataset with SciPy library
- Explain the basics of scikit-learn library in Python?
- How to find contours of an image using scikit-learn in Python?
- How can scikit learn library be used to preprocess data in Python?
- How can scikit-learn library be used to load data in Python?
- How can data be scaled using scikit-learn library in Python?
- Implementing K-means clustering with SciPy by splitting random data in 2 clusters?
- Implementing K-means clustering with SciPy by splitting random data in 3 clusters?
- How to view the pixel values of an image using scikit-learn in Python?
- How to eliminate mean values from feature vector using scikit-learn library in Python?
- What are the additional issues of K-Means Algorithm in data mining?
- Learning Model Building in Scikit-learn: A Python Machine Learning Library
- Explain how L1 Normalization can be implemented using scikit-learn library in Python?
- Explain how L2 Normalization can be implemented using scikit-learn library in Python?