WaveCluster is a multiresolution clustering algorithm that first summarizes the records by imposing a multidimensional grid architecture onto the data space. It can use a wavelet transformation to change the original feature space, finding dense domains in the transformed space.
In this method, each grid cell summarizes the data of a group of points that map into the cell. This summary data generally fit into the main memory for use by the multiresolution wavelet transform and the subsequent cluster analysis.
A wavelet transform is a signal processing approach that decomposes a signal into multiple frequency subbands. The wavelet model can be used to d-dimensional signals by using a one-dimensional wavelet transform d times. In applying a wavelet transform, data are changed to preserve the relative distance among objects at several levels of resolution. This enables the natural clusters in the data to become more detectable. Clusters can be recognized by searching for dense areas in the new domain.
The advantage of wavelet transformation is as follows −
It provides unsupervised clustering: It needs hat-shaped filters that emphasize areas where the points cluster, while suppressing weaker data outside of the cluster boundaries.
It provides unsupervised clustering − It needs hat-shaped filters that emphasize areas where the points cluster, while suppressing weaker data outside of the cluster boundaries.
Therefore, dense regions in the initial feature space act as attractors for adjacent points and as inhibitors for points that are further away. This defines that the clusters in the data automatically stand out and “clear” the regions around them. Therefore, another benefit is that wavelet transformation can automatically result in the elimination of outliers.
The multiresolution features of wavelet transformations can support detecting clusters at several levels of accuracy.
Wavelet-based clustering is very quick, with a computational complexity of O (n), where n is the number of objects in the database. The algorithm implementation can be created parallel.
WaveCluster is a grid-based and density-based algorithm − It conforms with several requirements of a good clustering algorithm − It manages large data sets efficiently, find clusters with arbitrary shape, successfully manage outliers, is insensitive to the order of input, and does not need the definitions of input parameters including the number of clusters or a neighborhood radius.
In preliminary studies, WaveCluster was discovered to outperform BIRCH, CLARANS, and DBSCAN in terms of both efficiency and clustering quality. The study also discovered WaveCluster capable of managing data with up to 20 dimensions.