What is data warehouse tuning?

Data Warehousing is an approach that can collect and handle data from several sources to provide the business a meaningful business insight. A data warehouse is specifically designed to support management decisions.

In simple terms, a data warehouse defines a database that is maintained independently from an organization’s operational databases. Data warehouse systems enable the integration of multiple application systems. They provide data processing by offering a solid platform of consolidated, historical information for analysis.

Data warehouses generalize and centralize data in multidimensional space. The construction of data warehouses contains data cleaning, data integration, and data transformation and can be looked at as an important preprocessing step for data mining.

It provides online analytical processing (OLAP) tools for the interactive analysis of multidimensional data of varied granularities, which facilitates effective data generalization and data mining. There are several data mining functions, including association, classification, prediction, and clustering can be integrated with OLAP operations to build up interactive mining of knowledge at various levels of abstraction.

OLAP is a wide term that also surrounds data warehousing. In this model, data is saved in a format, which allows the effective creation of data mining/documents. OLAP design should accommodate documenting on large recordsets with a small degradation in operational effectiveness.

The complete term that can define taking data structures in an OLTP structure and influencing the same data in an OLAP structure is “Dimensional Modeling” It is the basic building block of Data Warehousing.

The tuning of the data warehouse can be done to improve the performance. The data warehouse is the entry point into the system, and it provides the first opportunity to improve performance. If the checks are performed on the data warehouse system, either before or after the data is loaded, then it will have a direct effect on the capacity and the performance of the system.

For example, if the data is telephone call records, it can be checked that each call has a valid customer identifier. If the data is sales information then it can be checked that whether the commodity being sold has a valid product identifier.

Loading large quantities of data or doing a heavy I/O operation, can be CPU intensive when there are a lot of checks and transformations to be applied to each record. The loading speed can be improved by using direct load techniques. It can also be improved by using parallelism.

There are the following steps that provide the best method for tuning a data warehouse −

  • Tune the business rules.
  • Tune the data design.
  • Tune the application design.
  • Tune the logical structure of the database.
  • Tune the database operations.
  • Tune the access paths.
  • Tune I/O and physical structure.
  • Tune resource contention.
  • Tune the underlying pattern.

Updated on: 15-Feb-2022


Kickstart Your Career

Get certified by completing the course

Get Started