Dimensional Data Modeling


Dimensional modeling was developed by Ralph Kimball. It is a modeling technique used for data storage in the data warehouse. The motive is to make data extraction faster and smoother by optimizing the database. The dimensional model is used for analysis as it is made to read, summarise and analyze numeric data in the data warehouse. We use this model because it makes the storage and retrieval easy for the data of the data warehouse. The dimensional models have their unique way to store data. It is used by many OLAP(Online analytical processing) systems.

Elements of the dimensional data model

FACTS

Facts are data that are used as measurable elements for a business process.

Example : In a shop, facts can be the profit, items sold or the revenue.

Dimensions

Dimension is used as a path to look for information in the facts. Classification of data is done using dimensions. A dimension is described as a set of attributes.

Example : For an Item, The attributes can be item name or price.

ATTRIBUTES

Attributes are the features of dimension in data modeling which are used to search and refine facts.

Example : For a location, attributes can be country, state, or district.

FACT TABLE

The fact table is the primary table or the center table that consists of measurements. The fact table contains foreign keys to dimension tables.

DIMENSION TABLE

The dimension table describes the attributes of the measurement. The dimension table mentions the facts and is joined by the foreign key. Dimension tables are denormalized tables that can have one or more relationships.

Types of Dimensions in Data Warehouse Model

  • Conformed Dimension

  • Outrigger Dimension

  • Shrunken Dimension

  • Role−Playing Dimension

  • Dimension to Dimension Table

  • Junk Dimension

  • Degenerate Dimension

  • Swappable Dimension

  • Step Dimension

Steps to Create Dimensional Data Modeling

Identify The Business Process

The very first thing to do is to identify the business objective as per the need of the organization. The selection of a business process will depend upon the type of data quality that is provided for the process. Plain text or Business Process Modelling Notation(BPMN) is used to describe the business process.

Identify the Grain

To identify the lowest level of information of any table in the data warehouse, grains are used. Grains also help in describing the level of detail and its solution for any business problem.

Identify the Dimensions

The dimension is where the data should get stored. With the help of dimensions, we get the desired output for a business problem by describing data warehouse facts and measures.

Identify the Fact

These are the data that are measurable as the user gets access to the stored data that is present in the data warehouse. Mostly, the rows in the table are numeric values.

Build Schema

A schema is a database structure. Implementation of the dimensional model takes place in this step. The two schemas are:

Star Schema

It is the simplest type of data warehouse schema with a simple database design. The name 'star' came up as the structure resembles a star. The center of the schema consists of fact tables which are surrounded by dimension tables. The relationship between fact and dimension table is described by single join only.

Snowflake Schema

It is the advanced version of the star schema as it has additional dimensions. The name 'snowflake' came up as the structure resembles a snowflake. Snowflakes are surrounded by dimension tables which are then surrounded by dimension tables. Many joins are used in the snowflake schema. Database design is very complex in a Snowflake schema.

Rules for Dimensional Modeling

  • To store the data in a dimensional structure.

  • Every fact table must have a corresponding data dimension table.

  • Ensure that the amount of information or granularity of each fact in a single fact table is the same.

  • The usage of a surrogate key must be ensured in dimension tables.

  • Create business solutions that consistently strike a balance between needs and realities to assist their decision−making

Advantages of Dimensional Data Modeling

Easy Data Access

Dimensional data modeling helps users by making data access easy. Also, it lessens the time and effort to extract and analyze the data.

Enhanced Query Processing

Dimensional data modeling consists of a simple structure that results in faster query performance.

Flexibility

Flexibility to perform analysis increases with dimensional data modeling, helping users to easily understand relationships between data.

Data Quality

Redundancy and inconsistency are reduced which improves the data quality.

Disadvantages of Dimensional Data Modeling

Limited Complexity

Dimensional data modeling has a simple structure and can not handle complex data relationships.

Limited Integration

Dimensional data modeling is not good with other data models like the models that are dependent on the normalization process.

Limited Scalability

Dimensional data modeling is not scalable as other modeling techniques like the one for a huge dataset.

Limited History Tracking

Main focus is on current data, so would be able to track changes in the history of data.

Conclusion

This article consists of dimensional data modeling, used for data storage, and optimizes the database to make the performance faster. Elements of a dimensional data model are facts, dimensions, attributes, fact tables, and dimension tables. Various types of Dimensions in the Data Warehouse Model are Conformed Dimension, Outrigger Dimension, Shrunken Dimension, Role−Playing Dimension, Dimension to Dimension Table, Junk Dimension, Degenerate Dimension, Swappable Dimension, and Step Dimension. Some steps to create Dimensional Data Modeling are Identify The Business Process, Identifying the Grain, dimension, and fact, and building a schema. Some rules for dimensional modeling are about fact and dimension, amount of information, and usage of surrogate keys. Advantages are easy data access, enhanced query performance, increased flexibility, improved data quality, and ease to understand. Disadvantages are limited complexity, integration, scalability, and history tracking.

Updated on: 14-Jul-2023

281 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements