How can we find subspace clusters from high-dimensional data?

There are several methods have been categorized into three major groups including subspace search techniques, correlation-based clustering techniques, and biclustering techniques.

Subspace Search Technique − A subspace search method searches several subspaces for clusters. Therefore, a cluster is a subset of objects that are the same as each other in a subspace. The similarity is acquired by conventional measures including distance or density.

For instance, the CLIQUE algorithm is a subspace clustering technique. It can specify the subspaces and the clusters in those subspaces in a dimensionality-increasing series and uses antimonotonicity to prune subspaces in which no cluster can continue. A bigger challenge that subspace search technique face is how to search a sequence of subspaces effectively.

There are two types of methods are as follows −

  • Bottom-up method begins from low-dimensional subspaces and search higher-dimensional subspaces only when there can be clusters in those larger-dimensional. There are several pruning approaches are analysed to reduce the multiple higher-dimensional subspaces that required to be searched. CLIQUE is an instance of a bottom-up approach.

  • Top-down method begin from the complete space and search smaller and smaller subspaces recursively. Top-down methods are efficient only if the locality assumption influence, which need that the subspace of a cluster can be decided by the local neighborhood.

Correlation-Based Clustering Methods − While subspace search methods search for clusters with a similarity that is computed using conventional metrics such as distance or density, correlation-based methods can find clusters that are represented by advanced correlation models.

A PCA-based approach first uses PCA (Principal Components Analysis) to change a set of new, uncorrelated dimensions, and therefore mine clusters in the new space or its subspaces. Furthermore PCA, other space transformations can be used, including the Hough transform or fractal dimensions.

Biclustering Methods − In some applications, it is required to cluster both objects and attributes at the same time. The resulting clusters are called biclusters and meet four requirements as follows −

  • It is only a small group of objects perform in a cluster.

  • A cluster only contains a small number of attributes.

  • An object can participate in several clusters, or does not engage in any cluster.

  • An attribute can be included in several clusters, or is not contained in any cluster.

Biclustering techniques were first recommended to address the requirements for exploring gene expression data. A gene is a system of the passing-on of traits from a living structure to its offspring. Generally, a gene consist on a segment of DNA.

Genes are critical for all living things because they define some proteins and functional RNA chains. They influence the data to build and support a living organism’s cells and pass genetic traits to offspring.

A genotype is the genetic makeup of a cell, an organism, or an individual. Phenotypes are apparent features of an organism. Gene expression is the important level in genetics in that genotypes cause phenotypes.

Updated on: 18-Feb-2022


Kickstart Your Career

Get certified by completing the course

Get Started