How does the discordancy testing work?

A statistical discordancy test analysis two hypotheses; a working hypothesis and a different hypothesis. A working hypothesis, H, is a statement that the entire data set of n objects comes from an initial distribution model, F, i.e., H: o_i Î F, where i = 1, 2, n.

The hypothesis is retained if there is no statistically important evidence supporting its rejection. A discordancy test checks whether an object o_i is essentially large (or small) regarding the distribution F. Different test statistics have been proposed for use as a discordancy test, based on the available knowledge of the data.

Suppose that some statistic T has been selected for discordancy testing, and the value of the statistic for object o_i is v_i, then the distribution of T is constructed. Significance probability SP (v_i) = Prob (T > v_i) is evaluated.

If some SP (v_i) is sufficiently small, then o_i is discordant and the working hypothesis is rejected. An alternative hypothesis, which states that o_i appears from another distribution model, G, is adopted. The result is very much based on which F model is chosen because o_i can be an outlier under one model and a completely valid value under another.

The alternative distribution is very essential in deciding the power of the test, i.e. the probability that the working hypothesis is rejected when o_i is an outlier. There are several types of alternative distributions.

Inherent alternative distribution − In this case, the working hypothesis that all of the objects come from distribution F is rejected in favor of the alternative hypothesis that all of the objects increase from another distribution, G −

H: o_i Î G, where i = 1, 2, ..., n

F and G can be different distributions or differ only in parameters of the same distribution. There are constraints on the form of the G distribution in that it should have the potential to make outliers. For example, it can have a different mean or dispersion, or a long tail.

Mixture alternative distribution − The mixture alternative states that discordant values are not outliers in the F populations, but contaminates from some other populations. In this case, the alternative hypothesis is −

H: o_i Î (1 – l) F + lG, where i = 1, 2, ..., n

Slippage alternative distribution − This alternative states that all of the objects (apart from some prescribed small number) arise independently from the original model F with parameters m and s2, while the remaining objects are independent observations from a modified version of F in which the parameters have been changed.

Ginni

Updated on: 2021-11-24T06:38:13+05:30

496 Views

Kickstart Your Career

Get certified by completing the course

Get Started