What is DENCLUE?

Clustering is the significant data mining approaches for knowledge discovery. The clustering is an exploratory data analysis methods that categorizes several data objects into same groups, such as clusters.

DENCLUE represents Density-based Clustering. It is a clustering approach depends on a group of density distribution functions. The DENCLUE algorithm use a cluster model depends on kernel density estimation. A cluster is represented by a local maximum of the predicted density function.

DENCLUE doesn't operate on records with uniform distribution. In high dimensional space, the data always look like uniformly distributed because of the curse of dimensionality. Hence, DENCLUDE doesn't operate well on high-dimensional records in general.

The method is built on the following ideas which are as follows −

  • The influence of each data point can be formally modeled using a mathematical function, called an influence function, which describes the impact of a data point within its neighbourhood.

  • The complete density of the data area can be modeled analytically as the sum of the influence function used to some data points.

  • Clusters can be determined numerically by recognizing density attractors, where density attractors are local maxima of the complete density function.

Let x and y be objects or points in fd, a d-dimensional input space. The influence function of data object y on x is a function, $\mathrm{f_B^y\colon f^{d}\rightarrow R_0^+}$, which is defined in terms of a basic influence function fB:

$$\mathrm {f_B^y(X)=f_{B}(X,Y)}$$

This reflects the impact of y on x. In principle, the influence function can be an arbitrary function that can be determined by the distance between two objects in a neighbourhood. The distance function, d(x, y), must be reflexive and symmetric, including the Euclidean distance function.

It is generally used to calculate a square wave influence function,

$$\mathrm{f_{square}(X,Y)=\begin{Bmatrix}0 \:\:\:\:\:\:\:\:\:\:\:\mathrm{if\:d(x, y) > \sigma}\1\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\mathrm{otherwise}\end{Bmatrix}}$$

or a Gaussian influence function,

$$\mathrm{f_{Gauss}(x, y)=e-\frac{d(x, y)^2}{2{\sigma}^2}}$$

Advantage of DENCLUE

There are several advantage of DENCLUE which are as follows −

  • It has a solid numerical foundation and generalizes several clustering approaches, such as partitioning, hierarchical, and density-based methods.

  • It has good clustering properties for data sets with large amounts of noise.

  • It enables a compact numerical description of arbitrarily shaped clusters in high-dimensional information sets.

  • It uses grid cells, yet only keeps information about grid cells that actually contain data points. It manages these cells in a tree-based access structure, and thus is significantly faster than some influential algorithms, such as DBSCAN.

  • These method requires careful selection of the density parameter σ and noise threshold ξ, as the selection of such parameters may significantly influence the quality of the clustering results.