# How to use DBSCAN clustering algorithm in Python Scikit-learn?

DBSCAN stands for Density-based spatial clustering of applications with noise. This algorithm is based on the intuitive notion of “clusters” & “noise” that clusters are dense regions of the lower density in the data space, separated by lower density regions of data points.

Scikit-learn have sklearn.cluster.DBSCAN module to perform DBSCAN clustering. There are two important parameters namely min_samples and eps used by this algorithm to define dense. Higher value of parameter min_samples or lower value of the parameter eps will give an indication about the higher density of data points which is necessary to form a cluster.

## Steps

We can follow the below given steps to perform DBSCAN clustering algorithm in Python Scikit-learn −

Step 1 − Import required libraries.

Step 2 − Set feagure size.

Step 3 − Define a binary classification dataset having 2000 samples with two input features and one cluster per class.

Step 4 − Create a scatter plot for all samples from each class.

Step 5 − Define the DBSCAN clustering model.

Step 6 − Fit the model.

Step 7 − Retrieve the unique clusters from all clusters.

Step 8 − Create the scatter plot for all samples from each cluster.

Step 9 − Plot the figure.

## Example

For the example below, we will create a test binary classification dataset by using the make_classification() function. This dataset would consist of 2000 samples with two input features and one cluster per class.

# Import required libraries
from numpy import unique
from numpy import where
from sklearn.datasets import make_classification
from sklearn.cluster import DBSCAN
from matplotlib import pyplot
%matplotlib inline

# Set the figure size
pyplot.rcParams["figure.figsize"] = [7.16, 3.50]
pyplot.rcParams["figure.autolayout"] = True

# Define binary classification dataset having 2000 samples with two input features and one cluster per class.
X,y = make_classification(n_samples=2000, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=4)

# Create scatter plot for all samples from each class
for value in range(2):

# Getting row indexes for samples
row = where(y == value)
# Creating scatter plot of all the samples
pyplot.scatter(X[row, 0], X[row, 1])

# Plot the figure
pyplot.title('Classification Dataset', size ='18')
pyplot.show()

# Define the DBSCAN clustering model
DBSCAN_model = DBSCAN(eps=0.50, min_samples=10)

# Fit the model
yc = DBSCAN_model.fit_predict(X)

# Retrieve the unique clusters from all clusters
clusters_AC = unique(yc)

# Create scatter plot for all samples from each cluster
for cluster in clusters_AC:

# Getting row indexes for all samples within this cluster
row = where(yc == cluster)
# Creating scatter plot of all the samples
pyplot.scatter(X[row, 0], X[row, 1])

# Plot the figure
pyplot.title('Cluster Prediction for Each Example in Dataset', size ='18')
pyplot.show()


## Output

It will produce the following output −

Updated on 04-Oct-2022 08:48:56