What are the Implementations of Data Warehouse?

Data warehouses contain huge volumes of data. OLAP servers demand that decision support queries be acknowledged in the order of seconds. Thus, it is essential for data warehouse systems to provide highly effective cube computation techniques, access techniques, and query processing techniques.

Efficient Computation of Data Cubes

At the core of multidimensional data analysis is the efficient computation of aggregations across many sets of dimensions. In SQL terms, these aggregations are referred to as group-by’s. Each group-by can be represented by a cuboid, where the set of group-by’s forms a lattice of cuboids defining a data cube.

There are three choices for data cube materialization given a base cuboid −

  • No materialization − It does not precompute any of the “nonbase” cuboids. This leads to computing expensive multidimensional aggregates on the fly, which can be extremely slow.

  • Full materialization − It can Pre-compute all of the cuboids. The resulting lattice of computed cuboids is defined as the full cube. This choice typically requires huge amounts of memory space to store all of the precomputed cuboids.

  • Partial materialization − It can selectively calculate a proper subset of the whole set of possible cuboids. Alternatively, it can calculate a subset of the cube, which includes only those cells that satisfy some user-specified criterion, including where the tuple count of each cell is following some threshold.

Indexing OLAP Data

It can support efficient data accessing, some data warehouse systems provide index structures and materialized views (using cuboids). The bitmap indexing approaches is famous in OLAP products because it enables fast searching in data cubes. The bitmap index is an alternative representation of the record ID (RID) list.

In the bitmap index for a given attribute, there is a distinct bit vector, Bv, for each value v in the domain of the attribute. If the domain of a given attribute includes n values, then n bits are required for each entry in the bitmap index (i.e., there are n bit vectors). If the attribute has the value v for a given row in the data table, then the bit defining that value is set to 1 in the corresponding row of the bitmap index. All other bits for that row are set to 0.

Efficient Processing of OLAP Queries

The goals of materializing cuboids and constructing OLAP index structures is to speed up query processing in data cubes.

  • Determine which operations should be performed on the available cuboids − This contains transforming some selection, projection, roll-up (group-by), and drill-down operations represented in the query into the corresponding SQL and/or OLAP operations. For instance, slicing and dicing a data cube can correspond to selection and projection operations on a materialized cuboid.

  • Determine to which materialized cuboid(s) the relevant operations should be applied − This contains identifying some materialized cuboids that can potentially be used to answer the query, pruning the following set using knowledge of “dominance” relationships between the cuboids, estimating the values of using the remaining materialized cuboids and choosing the cuboid with the minimum cost.