What is Sequential Exception Technique?

Data MiningDatabaseData Structure

The sequential exception technique simulates the method in which humans can distinguish unusual sets from between a sequence of supposedly like objects. It helps implicit redundancy of the data.

Given a data set, D, of n objects, it construct a sequence of subsets, {D1, D2,..., Dm}, of these objects with 2 ≤ m ≤ n including

$$\mathrm{D_{j−1}\subset D_{j}\:\:where\: D_{j}\subseteq D}$$

Dissimilarities are assessed between subsets in the series. The technique learns the following terms which are as follows −

Exception set − This is the set of deviations or outliers. It is defined as the smallest subset of objects whose removal results in the highest reduction of unlikeness in the residual set.

Dissimilarity function − This function does not need a metric distance among the objects. Given a set of objects, restore a low value if the objects are same to one another. The higher the dissimilarity between the objects, the higher the value returned by the function.

The dissimilarity of a subset is incrementally computed depends on the subset previous to it in the sequence. Given a subset of n numbers, {x1,..., xn}, a possible dissimilarity function is the variance of the numbers in the set

$$\mathrm{\frac{1}{n}\displaystyle\sum\limits_{i=1}^n (x_{i}-x^{'})^2}$$

where x' is the mean of the n numbers in the set. For character strings, the dissimilarity function can be in the design of a pattern string (e.g., including wildcard characters) that can cover all of the patterns view so far. The dissimilarity increases when the pattern covering some strings in Dj−1 does not cover some string in Dj that is not in Dj−1.

Cardinality function − This is usually the count of the multiple objects in a given set.

Smoothing factor − This function is calculated for each subset in the sequence. It assesses how much the dissimilarity can be decreased by eliminating the subset from the initial set of objects. This value is rate by the cardinality of the set. The subset whose smoothing factor value is the highest is the exception set.

The function of finding an exception set can be NP-hard (i.e., intractable). A sequential method is computationally possible and can be executed using a linear algorithm.

Rather than assessing the dissimilarity of the current subset concerning its complementary set, the algorithm choose a series of subsets from the set for analysis. For each subset, it decides the dissimilarity difference of the subset concerning the previous subset in the sequence.

Updated on 17-Feb-2022 11:18:57