What are the strategies for cube computation?

There are the following general optimization techniques for the efficient computation of data cubes which are as follows −

  • Sorting, hashing, and grouping − Sorting, hashing, and grouping operations should be used to the dimension attributes to reorder and cluster associated tuples. In cube computation, aggregation is implemented on the tuples (or cells) that share a similar set of dimension values. Thus it is essential to explore sorting, hashing, and grouping services to access and group such data to promote computation of such aggregates.

    For instance, it can evaluate total sales by branch, day, and item, it is more effective to sort tuples or cells by branch, and thus by day, and then group them according to the item name. Effective implementations of such operations in huge data sets have been extensively calculated in the database research community. Such implementations can be calculated to data cube computation.

  • Simultaneous aggregation and caching intermediate results − In cube computation, it is adequate to compute higher-level aggregates from previously computed lower-level aggregates, instead of from the base fact table. Furthermore, simultaneous aggregation from cached intermediate computation results can lead to the reduction of costly disk I/O operations.

    For instance, it can compute sales by branch or can use the intermediate results changed from the computation of a lower-level cuboid, including sales by branch and day. This method can be extended to implement amortized scans (i.e., computing as many cuboids as possible simultaneously to amortize disk reads).

  • Aggregation from the smallest child, when there exist multiple child cuboids − When there exist several child cuboids, it is generally more effective to evaluate the desired parent (i.e., more generalized) cuboid from the smallest, formerly computed child cuboid.

    For instance, it can compute a sales cuboid, CBranch, when there exist two previously computed cuboids, C{Branch, Year}and C{Branch, Item}, it is further efficient to compute CBranch from the former than from the latter if there are several more distinct items than distinct years.

  • The Apriori pruning method can be explored to compute iceberg cubes efficiently − The Apriori property in the context of data cubes, states as follows: If a given cell does not satisfy minimum support, then no descendant (i.e., more functional or accurate version) of the cell will satisfy minimum support either. This property can be used to substantially decrease the calculation of iceberg cubes.

    The specification of iceberg cubes includes an iceberg condition, which is a constraint on the cells to be materialized. A common iceberg condition is that the cells should satisfy a minimum support threshold, including a minimum count or sum. In this case, the Apriori property can be used to shorten away the exploration of the descendants of the cell.