Data Warehousing and Data Mining


Data Warehousing

Data warehousing is a collection of tools and techniques using which more knowledge can be driven out from a large amount of data. This helps with the decision-making process and improving information resources. 

Data warehouse is basically a database of unique data structures that allows relatively quick and easy performance of complex queries over a large amount of data. It is created from multiple heterogeneous sources.

Characteristics of Data Warehousing

  • Integrated
  • Time variant 
  • Non-volatile

The purpose of Data warehouse is to support the decision making process. It makes information easily accessible as we can generate reports from the data warehouse. It usually contains historical data derived from transactional data but can also include data from other sources. Data warehouse is always kept separated from transactional data. 

We have multiple data sources on which we apply ETL processes in which we Extract data from data source, then transform it according to some rules and then load the data into the desired destination, thus creating a data warehouse.

Data Mining 

Data mining refers to extracting knowledge from large amounts of data. The data sources can include databases, data warehouse, web etc.

Knowledge discovery is an iterative sequence:

  • Data cleaning – Remove inconsistent data.

  • Data integration – Combining multiple data sources into one.

  • Data selection – Select only relevant data to be analysed.

  • Data transformation – Data is transformed into appropriate form for mining.

  • Data mining – methods to extract data patterns.

  • Pattern evaluation – identify interesting patterns in the data.

  • Knowledge representation- visualization and knowledge representation techniques are used.

What kind of data that can be mined?

  • Database Data
  • Data Warehouse 
  • Transactional Data

Scope of Data mining

  • Automated Prediction of trends and behaviours: Data mining automates the process of finding the predictive information in large databases. For example : Consider a marketing company. In this company, data mining uses the past promotional mailing to identify the targets to maximize the return.

  • Automated discovery of previously unknown patterns: Data mining sweeps through the database and identifies previously hidden patterns. For example: In a retail store data mining will go through the entire database and find the pattern for the items which are usually brought together.

Updated on 19-Jun-2020 10:51:23