How to use Affinity Propagation clustering algorithm in Python Scikit-learn?

PythonScikit-learnServer Side ProgrammingProgramming

Affinity propagation clustering algorithm is based on the concept of ‘message passing’ between different pairs of samples until convergence. It does not require the number of clusters to be specified before running the algorithm. One of the biggest disadvantages of this algorithm is its time complexity which is of the order $0(N^2T)$

Scikit-learn have sklearn.cluster.AffinityPropagation module to perform Affinity Propagation clustering in Python.

Steps

We can follow the below given steps to perform Affinity Propagation clustering algorithm in Python Scikit-learn −

Step 1 − Import necessary libraries.

Step 2 − Set the figure size.

Step 3 − Define binary classification dataset having 2000 samples with two input features and one cluster per class.

Step 4 − Create scatter plot for all samples from each class.

Step 5 − Plot the figure.

Step 6 − Define the AffinityPropagation clustering model.

Step 7 − Fit the model.

Step 8 − Assigning a cluster per sample.

Step 9 − Retrieve the unique clusters from all clusters.

Step 10 − Create scatter plot for all samples from each cluster.

Step 11 − Plot the figure.

Example

For the example below, we will create a test binary classification dataset by using the make_classification() function. This dataset would consist of 2000 samples with two input features and one cluster per class.

# Import necessary libraries from numpy import unique from numpy import where from sklearn.datasets import make_classification from sklearn.cluster import AffinityPropagation from matplotlib import pyplot # %matplotlib inline # Set the figure size pyplot.rcParams["figure.figsize"] = [7.16, 3.50] pyplot.rcParams["figure.autolayout"] = True # Define binary classification dataset having 2000 samples with two input features and one cluster per class. X,y = make_classification(n_samples=2000, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=4) # Create scatter plot for all samples from each class for value in range(2): # Getting row indexes for samples row = where(y == value) # Creating scatter plot of all the samples pyplot.scatter(X[row, 0], X[row, 1]) # Plot the figure pyplot.title('Classification Dataset', size ='18') pyplot.show() # Define the AffinityPropagation clustering model AP_model = AffinityPropagation(damping=0.6) # Fit the model AP_model.fit(X) # Assigning a cluster per sample yc = AP_model.predict(X) # Retrieve the unique clusters from all clusters clusters_AP = unique(yc) # Create scatter plot for all samples from each cluster for cluster in clusters_AP: # Getting row indexes for all samples within this cluster row = where(yc == cluster) # creating scatter plot of all the samples pyplot.scatter(X[row, 0], X[row, 1]) # Plot the figure pyplot.title('Cluster Prediction for Each Example in Dataset', size ='18') pyplot.show()

Output

It will produce the following output −



raja
Updated on 04-Oct-2022 08:46:43

Advertisements