How is this statistical information useful for query answering?

Data MiningDatabaseData Structure

The statistical parameters can be used in a top-down, grid-based approaches as follows. First, a layer within the hierarchical architecture is decided from which the query-answering procedure is to start.

This layer generally includes a small number of cells. For every cell in the current layer, it can compute the confidence interval (or estimated range of probability) reflecting the cell’s relevancy to the given query.

The statistical parameters of higher-level cells can simply be calculated from the parameters of the lower-level cells. These parameters contain the following − the attribute-independent parameter, count, and the attribute-dependent parameters, mean, stdev (standard deviation), min (minimum), max (maximum); and the type of distribution that the attribute value in the cell follows, including normal, uniform, exponential, or none (if the distribution is anonymous).

The irrelevant cells are removed from further consideration. Processing of the following lower level tests only the remaining relevant cells. This phase is repeated until the bottom layer is acquired. If the query description is met, the areas of relevant cells that use the query are restored.

STING offers several advantages which are as follows −

  • The grid-based calculation is query-independent, because the statistical data saved in each cell defines the summary records of the data in the grid cell, separate of the query.

  • The grid architecture supports parallel processing and incremental refreshing.

  • The technique efficiency is a major benefit. STING goes through the database because it can calculate the numerical parameters of the cells, and therefore the time complexity of generating clusters is O(n), where n is the total number of objects.

  • After making the hierarchical architecture, the query processing time is O(g), where g is the total number of grid cells at the lowest level, which is generally smaller than n.

  • Because STING need a multiresolution method to cluster analysis, the quality of STING clustering based on the granularity of the lowest level of the grid architecture. If the granularity is very fine, the value of processing will improve substantially; however, if the bottom level of the grid architecture is too rude, it can decrease the quality of cluster analysis.

  • STING does not treated the spatial relationship among the children and their neighboring cells for the development of a parent cell. As a result, the shapes of the outcoming clusters are isothetic; i.e., some cluster boundaries are horizontal or vertical, and no diagonal boundary is discovered. This can lower the quality and certainty of the clusters despite the quick processing time of the technique.

Updated on 17-Feb-2022 10:54:39