Difference between Dimensionality Reduction and Numerosity Reduction?

Dimensionality Reduction

In dimensionality reduction, data encoding or transformations are used to access a reduced or “compressed” depiction of the original data. If the original data can be regenerated from the compressed data without any loss of data, the data reduction is known as lossless. If data reconstructed is only approximated of the original data, then the data reduction is called lossy.

The DWT is nearly associated with the discrete Fourier transform (DFT), a signal processing technique containing sines and cosines. In general, the DWT achieves better lossy compression. That is if a similar number of coefficients is maintained for a DWT and a DFT of a given data vector, the DWT version will support a more accurate approximation of the original data. Therefore, for an equivalent approximation, the DWT needed less area than the DFT.

Numerosity Reduction

In the numerosity reduction, the data volume is decreased by selecting an alternative, smaller form of data representation. These techniques can be parametric or nonparametric. For parametric methods, a model can estimate the data, so that only the data parameters need to be saved, instead of the actual data, for example,Log-linear models. Non-parametric methods are used for storing a reduced representation of the data which include histograms, clustering, and sampling.

Let us see the comparison between Dimensionality Reduction and Numerosity Reduction.

Dimensionality ReductionNumerosity Reduction
In dimensionality reduction, data encoding or transformation are applied to obtain a reduced or compressed representation of original data.In numerosity reduction, data volume is reduced by choosing alternating, smaller forms of data representation.
In dimensionality reduction, the discrete wavelet transform (DWT) is a linear signal processing technique that, when used to a data vector X, changes it to a numerically different vector, X’, of wavelet coefficients.
The two vectors are of the same length. When applying this technique to data reduction, it can consider each tuple as an n-dimensional data vector, that is, X=(x1,x2,…xn)depicting n measurements made on the tuple from n database attributes.
In numerosity reduction, regression and log-linear models can be used to approximate the given data. In linear regression, the data are modeled to fit a straight line.
For example, a random variable, y (known as response variable), can be modeled as a linear function of another random variable, x (known as a predictor variable), with the equation y = wx+b, where the variance of y is assumed to be constant.
It can be used for removing irrelevant and redundant attributes.It is merely a representation technique of original data to a smaller form.
In this technique, some data can be lost which is inappropriate.In this method, there is no loss of data but the whole data is represented in a smaller form.

Updated on: 19-Nov-2021

460 Views