What is Data Warehouse?

Data Warehousing is a technique that is mainly used to collect and manage data from various sources to give the business a meaningful business insight. A data warehouse is specifically designed to support management decisions.

In simple terms, a data warehouse defines a database that is maintained independently from an organization’s operational databases. Data warehouse systems enable the integration of multiple application systems. They provide data processing by offering a solid platform of consolidated, historical information for analysis.

Data warehouses generalize and centralize data in multidimensional space. The construction of data warehouses contains data cleaning, data integration, and data transformation and can be looked at as an important preprocessing step for data mining.

It provides online analytical processing (OLAP) tools for the interactive analysis of multidimensional data of varied granularities, which facilitates effective data generalization and data mining. There are several data mining functions, including association, classification, prediction, and clustering can be integrated with OLAP operations to build up interactive mining of knowledge at various levels of abstraction.

There are three main types of Data Warehouses which are as follows −

Enterprise Data Warehouse (EDW) − Enterprise Data Warehouse is a centralized warehouse. It is used for organizing and representing the data. With the help of EDW, the user can classify data based on the subject.

Operational Data Store − In Operational Data Store, the Data warehouse is refreshed in real-time. Thus, it is more generally used for routine activities including storing records, etc.

Data Mart − A data mart can be defined as a subset of the data warehouse. It is designed for sales, finance, and so on.

Characteristics of Data Warehouse

There are various characteristics of a data warehouse which are as follows −

  • Subject-oriented − A data warehouse targets the modeling and analysis of information for decision-makers. Thus, data warehouses generally provide a simple and concise view of specific subject issues by excluding information that is not beneficial in the decision support process.

  • Integrated − As the data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and online transaction records, the data cleaning and data integration techniques need to be used to provide consistency in naming conventions, encoding mechanisms, attribute measures, etc.

  • Time-variant − Data is saved to provide data from a historical perspective (e.g., the past 5-10 years). Each key mechanism in the data warehouse includes, either implicitly or explicitly, an element of time.

  • Non-volatile − A data warehouse is always a physically independent store of data transformed from the software data found in the operational environment. Because of this separation, a data warehouse does not need transaction processing, recovery, and concurrency control structure. It usually requires only two operations in data accessing − initial loading of data and access of data.

Updated on: 22-Nov-2021


Kickstart Your Career

Get certified by completing the course

Get Started