Data Warehousing - Terminologies


In this article, we will discuss some of the commonly used terms in Data Warehouse.

Data Warehouse

Data warehouse is subject Oriented, Integrated, Time-Variant and nonvolatile collection of data that support of management's decision making process. Let's explore this Definition of data warehouse.

  • Subject Oriented - The Data warehouse is subject oriented because it provide us the information around a subject rather the organization's ongoing operations. These subjects can be product, customers, suppliers, sales, revenue etc. The data warehouse does not focus on the ongoing operations rather it focuses on modelling and analysis of data for decision making.

  • Integrated - Data Warehouse is constructed by integration of data from heterogeneous sources such as relational databases, flat files etc. This integration enhance the effective analysis of data.

  • Time-Variant - The Data in Data Warehouse is identified with a particular time period. The data in data warehouse provide information from historical point of view.

  • Non Volatile - Non volatile means that the previous data is not removed when new data is added to it. The data warehouse is kept separate from the operational database therefore frequent changes in operational database is not reflected in data warehouse.

  • Metadata - Metadata is simply defined as data about data. The data that are used to represent other data is known as metadata. For example the index of a book serve as metadata for the contents in the book.In other words we can say that metadata is the summarized data that lead us to the detailed data.

In terms of data warehouse we can define metadata as following:

  • Metadata is a road map to data warehouse.

  • Metadata in data warehouse define the warehouse objects.

  • The metadata act as a directory.This directory helps the decision support system to locate the contents of data warehouse.

Metadata Respiratory

The Metadata Respiratory is an integral part of data warehouse system. The Metadata Respiratory contains the following metadata:

  • Business Metadata - This metadata has the data ownership information, business definition and changing policies.

  • Operational Metadata -This metadata includes currency of data and data lineage. Currency of data means whether data is active, archived or purged. Lineage of data means history of data migrated and transformation applied on it.

  • Data for mapping from operational environment to data warehouse -This metadata includes source databases and their contents, data extraction,data partition, cleaning, transformation rules, data refresh and purging rules.

  • The algorithms for summarization - This includes dimension algorithms, data on granularity, aggregation, summarizing etc.

Data cube

Data cube help us to represent the data in multiple dimensions. The data cube is defined by dimensions and facts. The dimensions are the entities with respect to which an enterprise keep the records.

Illustration of Data cube

Suppose a company wants to keep track of sales records with help of sales data warehouse with respect to time, item, branch and location. These dimensions allow to keep track of monthly sales and at which branch the items were sold.There is a table associated with each dimension. This table is known as dimension table. This dimension table further describes the dimensions. For example "item" dimension table may have attributes such as item_name, item_type and item_brand.

The following table represents 2-D view of Sales Data for a company with respect to time,item and location dimensions.

data cube 2D

But here in this 2-D table we have records with respect to time and item only. The sales for New Delhi are shown with respect to time and item dimensions according to type of item sold. If we want to view the sales data with one new dimension say the location dimension. The 3-D view of the sales data with respect to time, item, and location is shown in the table below:

data cube 3D

The above 3-D table can be represented as 3-D data cube as shown in the following figure:

data cube 3D

Data mart

Data mart contains the subset of organisation-wide data. This subset of data is valuable to specific group of an organisation. in other words we can say that data mart contains only that data which is specific to a particular group. For example the marketing data mart may contain only data related to item, customers and sales. The data mart are confined to subjects.

Points to remember about data marts:

  • window based or Unix/Linux based servers are used to implement data marts. They are implemented on low cost server.

  • The implementation cycle of data mart is measured in short period of time i.e. in weeks rather than months or years.

  • The life cycle of a data mart may be complex in long run if it's planning and design are not organisation-wide.

    Data mart are small in size.
  • Data mart are customized by department.

  • The source of data mart is departmentally structured data warehouse.

  • Data mart are flexible.

Graphical Representation of data mart.

data mart

Virtual Warehouse

The view over a operational data warehouse is known as virtual warehouse. It is easy to built the virtual warehouse. Building the virtual warehouse requires excess capacity on operational database servers.