A Data warehouse is known as a central repository to store the data from one or multiple heterogeneous data sources. Data warehouse is used for reporting and analyzing of information and stores both historical and current data. The data in DW system is used for Analytical reporting, which is later used by Business Analysts, Sales Managers or Knowledge workers for decision-making.
The data in DW system is loaded from an operational transaction system like Sales, Marketing, HR, SCM, etc. It may pass through operational data store or other transformations before it is loaded to the DW system for information processing.
The key features of a DW System are −
It is central data repository where data is stored from one or more heterogeneous data sources.
A DW system stores both current and historical data. Normally a DW system stores 5-10 years of historical data.
A DW system is always kept separate from an operational transaction system.
Data in DW system is used for different types of analytical reporting range from Quarterly to Annual comparison.
Suppose you have a home loan agency where data is coming from multiple applications like- marketing, sales, ERP, HRM, MM etc. This data is extracted, transformed and loaded in Data Warehouse.
For example, if you have to compare the Quarterly/Annual sales of a product, you cannot use an Operational transactional database, as this will hang the transaction system. Therefore, a Data Warehouse is used for this purpose.
The differences between a Data Warehouse and Operational Database (Transactional Database) are as follows −
A Transactional system is designed for known workloads and transactions like updating a user record, searching a record, etc. However, Data Warehouse transactions are more complex and present a general form of data.
A Transactional system contains the current data of an organization and Data warehouse normally contains the historical data.
Transactional system supports parallel processing of multiple transactions. Concurrency control and recovery mechanisms are required to maintain consistency of the database.
An Operational database query allows to read and modify operations (delete and Update) while an OLAP query needs only read-only access of stored data (Select statement).
Data Warehousing involves data cleaning, data integration, and data consolidations.
A Data Warehouse has a 3-layer architecture − Data Source Layer, Integration Layer, and Presentation Layer. The illustration given above shows the common architecture of a Data Warehouse system.