What is STING?

Data MiningDatabaseData Structure

STING stands for Statistical Information Grid. STING is a grid-based multiresolution clustering method in which the spatial area is divided into rectangular cells. There are several methods of such rectangular cells equivalent to multiple methods of resolution, and these cells form a hierarchical structure each cell at a high level is separation to form several cells at the next lower level.

Statistical data regarding the attributes in each grid cell (including the mean, maximum, and minimum values) is precomputed and stored. Statistical parameters of higher-level cells can simply be calculated from the parameters of the lower-level cells.

These parameters contain the following − the attribute-independent parameter, count, and the attribute-dependent parameters, mean, stdev (standard deviation), min (minimum), max (maximum); and the type of distribution that the attribute value in the cell follows, including normal, uniform, exponential, or none (if the distribution is anonymous).

When the records are loaded into the database, the parameters count, mean, stdev, min, and max of the bottom-level cells are computed directly from the records. The value of distribution can be assigned by the user if the distribution type is known beforehand or obtained by hypothesis tests including the χ2 test.

The type of distribution of a larger-level cell that can be evaluated based on the bulk of distribution types of its equivalent lower-level cells in conjunction with a threshold filtering procedure. If the distributions of the lower-level cells disagree with each other and decline the threshold test, the distribution type of the high-level cell is set to none.

The grid-based clustering methods use a multi-resolution grid data structure. It quantizes the object space into a multiple cells that form a grid structure on which some operations for clustering are implemented. The benefit of the method is its quick processing time, which is generally independent of the number of data objects, still dependent on only the multiple cells in each dimension in the quantized space.

An instance of the grid-based approach involves STING, which explores statistical data stored in the grid cells, WaveCluster, which clusters objects using a wavelet transform approach, and CLIQUE, which defines a grid-and density-based method for clustering in high-dimensional data area.

The advantages of this approach are a query-independent approach since the statistical information exists independently of queries. It is a usual description of the data in each grid cell, which can be used to support answering a huge class of queries. The computational complexity is O (K), where K is the number of grid cells at the lowest level. It is usually K << N, where N is the number of objects.

raja
Updated on 16-Feb-2022 12:44:19

Advertisements