How to use DBSCAN clustering algorithm in Python Scikit-learn?

PythonScikit-learnServer Side ProgrammingProgramming

DBSCAN stands for Density-based spatial clustering of applications with noise. This algorithm is based on the intuitive notion of “clusters” & “noise” that clusters are dense regions of the lower density in the data space, separated by lower density regions of data points.

Scikit-learn have sklearn.cluster.DBSCAN module to perform DBSCAN clustering. There are two important parameters namely min_samples and eps used by this algorithm to define dense. Higher value of parameter min_samples or lower value of the parameter eps will give an indication about the higher density of data points which is necessary to form a cluster.

Steps

We can follow the below given steps to perform DBSCAN clustering algorithm in Python Scikit-learn −

Step 1 − Import required libraries.

Step 2 − Set feagure size.

Step 3 − Define a binary classification dataset having 2000 samples with two input features and one cluster per class.

Step 4 − Create a scatter plot for all samples from each class.

Step 5 − Define the DBSCAN clustering model.

Step 6 − Fit the model.

Step 7 − Retrieve the unique clusters from all clusters.

Step 8 − Create the scatter plot for all samples from each cluster.

Step 9 − Plot the figure.

Example

For the example below, we will create a test binary classification dataset by using the make_classification() function. This dataset would consist of 2000 samples with two input features and one cluster per class.

# Import required libraries from numpy import unique from numpy import where from sklearn.datasets import make_classification from sklearn.cluster import DBSCAN from matplotlib import pyplot %matplotlib inline # Set the figure size pyplot.rcParams["figure.figsize"] = [7.16, 3.50] pyplot.rcParams["figure.autolayout"] = True # Define binary classification dataset having 2000 samples with two input features and one cluster per class. X,y = make_classification(n_samples=2000, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=4) # Create scatter plot for all samples from each class for value in range(2): # Getting row indexes for samples row = where(y == value) # Creating scatter plot of all the samples pyplot.scatter(X[row, 0], X[row, 1]) # Plot the figure pyplot.title('Classification Dataset', size ='18') pyplot.show() # Define the DBSCAN clustering model DBSCAN_model = DBSCAN(eps=0.50, min_samples=10) # Fit the model yc = DBSCAN_model.fit_predict(X) # Retrieve the unique clusters from all clusters clusters_AC = unique(yc) # Create scatter plot for all samples from each cluster for cluster in clusters_AC: # Getting row indexes for all samples within this cluster row = where(yc == cluster) # Creating scatter plot of all the samples pyplot.scatter(X[row, 0], X[row, 1]) # Plot the figure pyplot.title('Cluster Prediction for Each Example in Dataset', size ='18') pyplot.show()

Output

It will produce the following output −



raja
Updated on 04-Oct-2022 08:48:56

Advertisements