What is Dimensionality Reduction?


In dimensionality reduction, data encoding or transformations are applied to obtain a reduced or “compressed” representation of the original data. If the original data can be reconstructed from the compressed data without any failure of information, the data reduction is known as lossless. If data reconstructed is only approximated of the original data, then the data reduction is called lossy.

There are two methods of lossy reduction which are as follows −

  • Wavelet Transforms − The discrete wavelet transform (DWT) is a linear signal processing technique that, when applied to a data vector X, transforms it to a numerically different vector, X’, of wavelet coefficients. The two vectors are of a similar length. When using this technique for data reduction, it can consider each tuple as an n-dimensional data vector, that is, 𝑋=(x1,x2,…xn)indicating n measurements made on the tuple from n database attributes.

The DWT is nearly associated with the discrete Fourier transform (DFT), a signal processing technique containing sines and cosines. In general, the DWT achieves better lossy compression. That is if the same number of coefficients is retained for a DWT and a DFT of a given data vector, the DWT version will provide a more accurate approximation of the original data. Hence, for an equivalent approximation, the DWT requires less space than the DFT.

Wavelet transforms can be used to multidimensional data, including a data cube. This is done by first applying the transform to the first dimension, then to the second, and so on. The computational complexity involved is linear for the number of cells in the cube.

Wavelet transforms give good results on sparse or skewed data and data with ordered attributes. Lossy compression by wavelets is reportedly better than JPEG compression, the current commercial standard. Wavelet transforms have many real-world applications, including the compression of fingerprint images, computer vision, analysis of time-series data, and data cleaning.

  • Principal Component Analysis − Principal components analysis is also called the Karhunen-Loeve, or K-L, method. It can search for k n-dimensional orthogonal vectors that can best be used to represent the data, where k ≤ n. The original data are projected onto a much smaller space, which results in dimensionality reduction. It combines the essence of attributes by creating an alternative smaller set of variables. The original data can be projected onto this smaller set.

Updated on: 19-Nov-2021

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements