What is PROCLUS?

PROCLUS stands for Projected Clustering. It is a usual dimension-reduction subspace clustering techniques. That is, rather than starting from individual-dimensional spaces, it begins by finding an original approximation of the clusters in the high-dimensional attribute area.

Each dimension is created a weight for each cluster, and the refreshed weights are used in the next iteration to recreate the clusters. This leads to the exploration of dense areas in all subspaces of some convenient dimensionality and prevents the generation of a huge number of overlapped clusters in projected dimensions of lower dimensionality.

PROCLUS discover the best group of medoids by a hill-climbing phase similar to that used in CLARANS, but generalized to manage with projected clustering. It adopts a distance measure known as Manhattan segmental distance, which is the Manhattan distance on a group of appropriate dimensions.

The PROCLUS algorithm includes three process are as follows: initialization, iteration, and cluster refinement. In the initialization process, it need a greedy algorithm to choose a set of original medoids that are far apart from each other so as to provide that each cluster is defined by minimum one object in the selected set.

It can select a random sample of data points proportional to the multiple clusters that it is required to generate, and then uses the greedy algorithm to receive an even smaller final subset for the next process.

The iteration process choose a random set of k medoids from this reduced set (of medoids), and restore “bad” medoids with randomly select new medoids if the clustering is increased.

For each medoid, a group of dimensions is selected whose average distances are small compared to mathematical expectation. The total number of dimensions related to medoids should be k×l, where l is an input parameter that choose the average dimensionality of cluster subareas.

The refinement process calculates new dimensions for each medoid depends on the clusters discovered, reassigns points to medoids, and delete outliers. PROCLUS displays that the method is effective and scalable at discovering high-dimensional clusters.

Unlike CLIQUE, which outputs many overlapped clusters, PROCLUS finds nonoverlapped partitions of points. The discovered clusters can provide better understand the high-dimensional data and supports other subsequence analyses.

CLIQUE necessarily discover subspaces of the largest dimensionality such that high-density clusters continue in those subspaces. It is unresponsive to the order of input objects and does not pretend some canonical data distribution. It scales linearly with the size of input and has best scalability as the multiple dimensions in the data is improved.