Why do we need a separate Data Warehouse?

Data Warehousing is a technique that is mainly used to collect and manage data from various sources to give the business a meaningful business insight. A data warehouse is specifically designed to support management decisions.

In simple terms, a data warehouse refers to a database that is maintained separately from an organization’s operational databases. Data warehouse systems enable for integration of several application systems. They provide data processing by supporting a solid platform of consolidated, historical information for analysis.

Data Warehouse queries are complicated because they contain the computation of huge groups of information at summarized levels. It can require the use of distinctive data organization, access, and implementation technique depends on multidimensional views.

A major reason for such a separation is to help boost the high implementation of both systems. An operational database is created and tuned from known functions and workloads, including indexing and hashing using primary keys, searching for specific records, and optimizing “canned” queries.

On the other hand, data warehouse queries are often complex. They include the computation of huge groups of information at summarized levels and can require the use of specific data organization, access, and implementation methods depends on multidimensional views. The processing of OLAP queries in operational databases would substantially degrade the performance of operational tasks.

Furthermore, an operational database provides the concurrent processing of multiple transactions. Concurrency control and recovery mechanisms, including locking and logging, are needed to provide the consistency and robustness of transactions. An OLAP query often required read-only access to data records for summarization and aggregation.

Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.

Finally, the segregation of operational databases from data warehouses is located on the multiple structures, contents, and uses of the information in these two systems. Decision support needed historical information, whereas operational databases do not generally maintain historical information.

Decision support requires consolidation including aggregation and summarization of information from heterogeneous sources, resulting in high-quality, clean, and integrated information. In contrast, operational databases include only detailed raw data, including transactions, which are required to be consolidated before analysis.

Because the two systems provide quite different functionalities and require different kinds of data, it is presently necessary to maintain separate databases. However, there are various vendors of operational relational database management systems that are beginning to optimize such systems to support OLAP queries. As this trend continues, the separation between OLTP and OLAP systems is expected to decrease.