# How to use K-Means clustering algorithm in Python Scikit-learn?

PythonScikit-learnServer Side ProgrammingProgramming

#### Beyond Basic Programming - Intermediate Python

Most Popular

36 Lectures 3 hours

#### Practical Machine Learning using Python

Best Seller

91 Lectures 23.5 hours

#### Practical Data Science using Python

22 Lectures 6 hours

K-Means clustering algorithm computes the centroids and iterates until it finds optimal centroid. It requires the number of clusters to be specified that’s why it assumes that they are already known. The main logic of this algorithm is to cluster the data separating samples in n number of groups of equal variances by minimizing the criteria known as the inertia. The number of clusters identified by algorithm is represented by ‘K.

Scikit-learn have sklearn.cluster.KMeans module to perform K-Means clustering algorithm in Python.

## Example

For the example below, we will create a test binary classification dataset by using the make_classification() function. This dataset would consist of 10000 samples with two input features and one cluster per class.

# Import required libraries
from numpy import unique
from numpy import where
from sklearn.datasets import make_classification
from sklearn.cluster import KMeans
from matplotlib import pyplot
#%matplotlib inline

# Set the figure size
pyplot.rcParams["figure.figsize"] = [7.16, 3.50]
pyplot.rcParams["figure.autolayout"] = True

# Define binary classification dataset having 10000 samples with two input features and one cluster per class.

X,y = make_classification(n_samples=10000, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=4)

# Create scatter plot for all samples from each class
for value in range(2):

# Getting row indexes for samples
row = where(y == value)

# Creating scatter plot of all the samples
pyplot.scatter(X[row, 0], X[row, 1])

# Plot the figure
pyplot.title('Classification Dataset', size ='18')
pyplot.show()

# Define the KMeans clustering model
KMeans_model = KMeans(n_clusters=2)

# Fit the model
KMeans_model.fit(X)

# Assigning a cluster per sample
yc = KMeans_model.predict(X)

# Retrieve the unique clusters from all clusters
clusters_AC = unique(yc)

# Create scatter plot for all samples from each cluster
for cluster in clusters_AC:

# Getting row indexes for all samples within this cluster
row = where(yc == cluster)

# creating scatter plot of all the samples
pyplot.scatter(X[row, 0], X[row, 1])

# Plot the figure
pyplot.title('Cluster Prediction for Each Example in Dataset', size ='18')
pyplot.show()


## Output

It will produce the following output −

## Mini-Batch K-Means Clustering Algorithm

Mini-Batch K-Means clustering algorithm is a modified version of k-means. Rather than using the entire dataset, as its name suggest, it makes updates to the cluster centroids using mini batches of samples. Due to this mini-batch k-means is faster and more robust.

Scikit-learn have sklearn.cluster.MiniBatchKMeans module to perform Mini-Batch K-Means clustering algorithm in Python.

### Example

For the example below, we will create a test binary classification dataset by using the make_classification() function. This dataset would consist of 10000 samples with two input features and one cluster per class.

# Import required libraries
from numpy import unique
from numpy import where
from sklearn.datasets import make_classification
from sklearn.cluster import MiniBatchKMeans
from matplotlib import pyplot
# %matplotlib inline

# Set the figure size
pyplot.rcParams["figure.figsize"] = [7.16, 3.50]
pyplot.rcParams["figure.autolayout"] = True

# Define binary classification dataset having 10000 samples with two input features and one cluster per class.
X,y = make_classification(n_samples=10000, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=4)

# Create scatter plot for all samples from each class
for value in range(2):

# Getting row indexes for samples
row = where(y == value)
# Creating scatter plot of all the samples
pyplot.scatter(X[row, 0], X[row, 1])

# Plot the figure
pyplot.title('Classification Dataset', size ='18')
pyplot.show()

# Define the KMeans clustering model
MBKMeans_model = MiniBatchKMeans(n_clusters=2)

# Fit the model
MBKMeans_model.fit(X)

# Assigning a cluster per sample
yc = MBKMeans_model.predict(X)

# Retrieve the unique clusters from all clusters
clusters_AC = unique(yc)

# Create scatter plot for all samples from each cluster
for cluster in clusters_AC:

# Getting row indexes for all samples within this cluster
row = where(yc == cluster)
# creating scatter plot of all the samples
pyplot.scatter(X[row, 0], X[row, 1])

# Plot the figure
pyplot.title('Cluster Prediction for Each Example in Dataset', size ='18')
pyplot.show()


### Output

It will produce the following output −

Updated on 04-Oct-2022 08:43:08