How are the exception values computed?

Data Mining Database Data Structure

There are three measures are used as exception indicators to support recognize data anomalies. These measures denotes the degree of surprise that the quantity in a cell influence, concerning its expected value.

The measures are computed and associated with every cell, for all levels of aggregation. They are as follows including the SelfExp, InExp, and PathExp measures are based on a numerical approaches for table analysis.

A cell value is treated an exception depends on how much it differs from its expected value, where its expected value is decided with a statistical model. The difference among a given cell value and its expected value is known as residual.

Intuitively, the higher the residual, the extra the provided cell value is an exception. The comparison of residual values requires us to scale the values based on the expected standard deviation associated with the residuals. A cell value is therefore considered an exception if it’s scaled residual value exceeds a prespecified threshold.

The SelfExp, InExp, and PathExp measures are based on this scaled residual. The expected value of a given cell is a service of the larger-level group-by’s of the provided cell. For instance, given a cube with the three dimensions A, B, and C, the expected value for a cell at the ith position in A, the jth position in B, and the kth position in C is a function of γ, γAi , γBj , γCk , γ ABij , γ ACik , and γ BCjk , which are coefficients of the numerical model used.

The coefficients follows how different the values at more levels are depends on generalized impressions formed by viewing at larger-level aggregations. In this approach, the exception quality of a cell value is depends on the exceptions of the values following it. Therefore, when viewing an exception, it is essential for the user to analyse the exception by drilling down.

This computation consists of three phases such as follows −

The first step contains the calculation of the aggregate values defining the cube, including sum or count, over which exceptions will be discovered.
The second phase consists of model fitting, in which the coefficients are determined and used to compute the standardized residuals. This phase can be overlapped with the first phase because the computations are same.
The third phase calculates the SelfExp, InExp, and PathExp values, depends on the standardized residuals. This phase is computationally equivalent to phase 1. Hence, the computation of data cubes for discovery-driven exploration can be completed effectively.

Ginni

Updated on: 16-Feb-2022

171 Views

Kickstart Your Career

Get certified by completing the course

Get Started