What is DBSCAN?

Data MiningDatabaseData Structure

DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise. It is a density based clustering algorithm. The algorithm increase regions with sufficiently high density into clusters and finds clusters of arbitrary architecture in spatial databases with noise. It represents a cluster as a maximum group of density-connected points.

The concept of density-based clustering includes a number of new definitions as follows −

  • The neighborhood within a radius ε of a given object is known as the εneighborhood of the object.

  • If the ε-neighborhood of an object includes at least a minimum number, MinPts, of objects, then the object is known as core object.

  • Given a set of objects, D, it can say that an object p is directly density-reachable from object q if p is inside the ε-neighborhood of q, and q is a core object.

  • An object p is density-reachable from object q concerning ε and MinPts in a group of objects, D, if there is a chain of objects p1,..., pn, where p1 = q and pn = p including pi+1 is directly density-reachable from pi concerning ε and MinPts, for 1 ≤ i ≤ n, pi ε D.

  • An object p is density-linked to object q concerning ε and MinPts in a group of objects, D, if there is an object o ε D such that both p and q are density-reachable from o concerning ε and MinPts.

Density reachability is the transitive closure of direct density reachability, and this connection is asymmetric. There is only core objects are mutually density reachable. Density connectivity is a symmetric relation.

A density-based cluster is a group of density-connected objects that is maximal concerning density-reachability. Each object not included in any cluster is treated to be noise.

DBSCAN searches for clusters by testing the ε-neighborhood of every point in the database. If the ε-neighborhood of a point p includes more than MinPts, a new cluster with p as a core object is made. DBSCAN repetitively collects directly density-reachable objects from these core objects, which can contain the merge of a few density-reachable clusters. The process removes when no new point can be inserted to any cluster.

If a spatial index is used, the computational complexity of DBSCAN is O(nlogn), where n is the number of database objects. Therefore, it is O (n2). With appropriate settings of the user-represented parameters ε and MinPts, the algorithm is efficient at discovering arbitrary-shaped clusters.

Updated on 16-Feb-2022 12:26:55