Dimensional modeling provides set of methods and concepts that are used in DW design. According to DW consultant, Ralph Kimball, dimensional modeling is a design technique for databases intended to support end-user queries in a data warehouse. It is oriented around understandability and performance. According to him, although transaction-oriented ER is very useful for the transaction capture, it should be avoided for end-user delivery.
Dimensional modeling always uses facts and dimension tables. Facts are numerical values which can be aggregated and analyzed on the fact values. Dimensions define hierarchies and description on fact values.
Dimension table stores the attributes that describe objects in a Fact table. A Dimension table has a primary key that uniquely identifies each dimension row. This key is used to associate the Dimension table to a Fact table.
Dimension tables are normally de-normalized as they are not created to execute transactions and only used to analyze data in detail.
In the following dimension table, the customer dimension normally includes the name of customers, address, customer id, gender, income group, education levels, etc.
Fact table contains numeric values that are known as measurements. A Fact table has two types of columns − facts and foreign key to dimension tables.
Measures in Fact table are of three types −
Additive − Measures that can be added across any dimension.
Non-Additive − Measures that cannot be added across any dimension.
Semi-Additive − Measures that can be added across some dimensions.
|Time ID||Product ID||Customer ID||Unit Sold|
This fact tables contains foreign keys for time dimension, product dimension, customer dimension and measurement value unit sold.
Suppose a company sells products to customers. Every sale is a fact that happens within the company, and the fact table is used to record these facts.
Common facts are − number of unit sold, margin, sales revenue, etc. The dimension table list factors like customer, time, product, etc. by which we want to analyze the data.
Now if we consider the above Fact table and Customer dimension then there will also be a Product and time dimension. Given this fact table and these three dimension tables, we can ask questions like: How many watches were sold to male customers in 2010?
The functional difference between dimension tables and fact tables is that fact tables hold the data we want to analyze and dimension tables hold the information required to allow us to query it.
Aggregate table contains aggregated data which can be calculated by using different aggregate functions.
An aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement.
Common aggregate functions include −
These aggregate tables are used for performance optimization to run complex queries in a data warehouse.
You save tables with aggregated data like yearly (1 row), quarterly (4 rows), monthly (12 rows) and now you have to do comparison of data, like Yearly only 1 row will be processed. However in an un-aggregated table, all the rows will be processed.
|MIN||Returns the smallest value in a given column|
|MAX||Returns the largest value in a given column|
|SUM||Returns the sum of the numeric values in a given column|
|AVG||Returns the average value of a given column|
|COUNT||Returns the total number of values in a given column|
|COUNT (*)||Returns the number of rows in a table|
Select Avg (salary) from employee where title = ‘developer’. This statement will return the average salary for all employees whose title is equal to 'Developer'.
Aggregations can be applied at database level. You can create aggregates and save them in aggregate tables in the database or you can apply aggregate on the fly at the report level.
Note − If you save aggregates at the database level it saves time and provides performance optimization.