What is the difference between K-Means and DBSCAN?

Data MiningDatabaseData Structure

K-Means

K-means clustering is the partitioning algorithm. K-means recreates each data in the dataset to only one of the new clusters formed. A data or data point is assigned to the adjacent cluster using a measure of distance or similarity.

In k-means, an object is generated to the nearest center. It can define cannot-link constraints, and it modifies the center assignment process in k-means to the closest applicable center assignment.

When the objects are created to centers in sequence, at each step it can provide the assignments so far do not disorganize some cannot-link constraints. An object is created to the closest center therefore the assignment respects some cannot-link constraints.

DBSCAN

DBSCAN represents Density-Based Spatial Clustering of Applications with Noise. It is a density-based clustering algorithm. The algorithm improves regions with adequately high density into clusters and discovers clusters of arbitrary structure in spatial databases with noise. It defines a cluster as a maximum set of density-connected points.

A density-based cluster is a set of density-connected objects that is maximal regarding density-reachability. Each object not contained in some cluster is considered to be noise.

DBSCAN checks for clusters by checking the ε-neighborhood of every point in the database. If the ε-neighborhood of a point p contains more than MinPts, a new cluster with p as a core element is produced. DBSCAN iteratively assemble precisely density-reachable objects from these essential element, which can include the merge of a few density-reachable clusters. The process eliminates when no new point can be added to any cluster.

Let us see the comparison between K-Means and DBSCAN.

K-MeansDBSCAN
K-means generally clusters all the objects.DBSCAN discards objects that it defines as noise.
K-means needs a prototype-based concept of a cluster.DBSCAN needs a density-based concept.
K-means has difficulty with non-globular clusters and clusters of multiple sizes.DBSCAN is used to handle clusters of multiple sizes and structures and is not powerfully influenced by noise or outliers.
K-means can be used for data that has a clear centroid, including a mean or median.DBSCAN needed that its definition of density, which depends on the traditional Euclidean concept of density, be significant for the data.
K-means can be used to sparse, high dimensional data, including file data.DBSCAN generally implements poorly for such information because the traditional Euclidean definition of density does not operate well for high dimensional data.
The basic K-means algorithm is similar to a statistical clustering approach (mixture models) that consider all clusters come from spherical Gaussian distributions with several means but the equal covariance matrix.DIISCAN creates no assumption about the distribution of the record.
raja
Updated on 14-Feb-2022 12:10:58

Advertisements