What are statistical approaches?

Data MiningDatabaseData Structure

Statistical approaches are model-based approaches such as a model is produced for the data, and objects are computed concerning how well they fit the model. Most statistical approaches to outlier detection are depends on developing a probability distribution model and considering how Iikely objects are below that model.

An outlier is an object that has a low probability concerning a probability distribution model of the data. A probability distribution model is produced from the data by computing the parameters of a user-defined distribution.

If the data is considered to have a Gaussian distribution, therefore the mean and standard deviation of the basic distribution can be measured by computing the mean and standard deviation of the data. The probability of every object below the distribution can be calculated.

A broad method of statistical tests based on been devised to identify outliers, or discordant observations, as they are known as in the statistical literature. Some of these discordancy tests are hugely specialized and consider a level of statistical knowledge further the capacity of this text.

Identifying the specific distribution of a data set − While several types of data can be defined by a small number of common distributions, including Gaussian, Poisson, or binomial, data sets with non-standard distributions are associatively common. Of course, if the wrong model is selected, therefore an object can be erroneously recognized as an outlier.

For instance, the data can be modeled as appearing from a Gaussian distribution, but can come from a distribution that has a larger probability (than the Gaussian distribution) of receiving values far from the mean. Statistical distributions with this kind of behavior are general in practice and called a heavy-tailed distributions.

The number of attributes used − Some statistical outlier detection techniques use to an individual attribute, but some techniques have been represented for multivariate data.

Mixtures of distributions − The data can be modeled as a combination of distributions, and outlier detection schemes can be produced based on such models. Although potentially more dynamic, such models are complex, both to learn and to use. For example, the distributions required to be identified earlier objects can be defined as outliers.

Statistical approaches to outlier detection have a firm foundation and constructed on standard statistical techniques, including computing the parameters of a distribution. When there is adequate knowledge of the data and the type of test that must be used these tests can be efficient. There are a broad method of statistical outlier’s tests for individual attributes. Fewer options are accessible for multivariate data, and these tests can implement poorly for high-dimensional record.

Updated on 14-Feb-2022 13:09:15