Data Warehousing - Concepts

Advertisements


What is Data Warehousing?

Data Warehousing is the process of constructing and using the data warehouse. The data warehouse is constructed by integrating the data from multiple heterogeneous sources. This data warehouse supports analytical reporting, structured and/or ad hoc queries and decision making. Data Warehousing involves data cleaning, data integration and data consolidations.

Using Data Warehouse Information

There are decision support technologies available which help to utilize the data warehouse. These technologies helps the executives to use the warehouse quickly and effectively. They can gather the data, analyse it and take the decisions based on the information in the warehouse. The information gathered from the warehouse can be used in any of the following domains:

  • Tuning production strategies - The product strategies can be well tuned by repositioning the products and managing product portfolios by comparing the sales quarterly or yearly.

  • Customer Analysis - The customer analysis is done by analyzing the customer's buying preferences, buying time, budget cycles etc.

  • Operations Analysis - Data Warehousing also helps in customer relationship management, making environmental corrections.The Information also allow us to analyse the business operations.

Integrating Heterogeneous Databases

To integrate heterogeneous databases we have the two approaches as follows:

  • Query Driven Approach

  • Update Driven Approach

Query Driven Approach

This is the traditional approach to integrate heterogeneous databases. This approach was used to build wrappers and integrators on the top of multiple heterogeneous databases. These integrators are also known as mediators.

Process of Query Driven Approach:

  • when the query is issued to a client side, a metadata dictionary translate the query into the queries appropriate for the individual heterogeneous site involved.

  • Now these queries are mapped and sent to the local query processor.

  • The results from heterogeneous sites are integrated into a global answer set.

Disadvantages

  • The Query Driven Approach needs complex integration and filtering processes.

  • This approach is very inefficient.

  • This approach is very expensive for frequent queries.

  • This approach is also very expensive for queries that requires aggregations.

Update Driven Approach

We are provided with the alternative approach to traditional approach. Today's Data Warehouse system follows update driven approach rather than the traditional approach discussed earlier. In Update driven approach the information from multiple heterogeneous sources is integrated in advance and stored in a warehouse. This information is available for direct querying and analysis.

Advantages

This approach has the following advantages:

  • This approach provide high performance.

  • The data are copied, processed, integrated, annotated, summarized and restructured in semantic data store in advance.

  • Query processing does not require interface with the processing at local sources.

Data Warehouse Tools and Utilities Functions

The following are the functions of Data Warehouse tools and Utilities:

  • Data Extraction - Data Extraction involves gathering the data from multiple heterogeneous sources.

  • Data Cleaning - Data Cleaning involves finding and correcting the errors in data.

  • Data Transformation - Data Transformation involves converting data from legacy format to warehouse format.

  • Data Loading - Data Loading involves sorting, summarizing, consolidating, checking integrity and building indices and partitions.

  • Refreshing - Refreshing involves updating from data sources to warehouse.

Note: Data Cleaning and Data Transformation are important steps in improving the quality of data and data mining results.



Advertisements
Advertisements