What is the processing of the Fact Table?

Data MiningDatabaseData Structure

Fact tables involve a composite primary key, which includes multiple foreign keys (one for each dimension table) and a column for each measure that uses these dimensions.

Every data staging process must include a step for replacing the production IDs in the incoming fact table record with the data warehouse surrogate keys, for each dimension in the fact table. Other processing, computation, and restructuring may also be necessary.

In the warehouse, Referential integrity (RI) defines that for each foreign key in the fact table, an entry continues in the equivalent dimension table. If you have a sale in the fact table for product number 323442, you need to have a product in the Product dimension table with the same number, or you won’t know what you’ve sold. You have a sale for what appears to be a non-existent product.

The disciplined and consistent need for surrogate keys defines that your extract logic always does two types of surrogate key lookups. First, you must create a new surrogate key every time you encounter a changed dimension record and wish to use the slowly changing dimension technique Type 2.

This is the main workhorse technique for handling occasional changes in a dimension record. You must create a new dimension record for the changed item and assign it a brand new surrogate key. The only attributes in this new record that are different from the original record are the surrogate key and whichever field or fields triggered the changed description.

The second kind of surrogate key lookup occurs when the fact table records are being processed. Remember that to preserve referential integrity, we always complete our updating of the dimension records first. In that way, the dimension tables are always the legitimate source of primary keys. In this second kind of surrogate key lookup, we must replace the production keys in the fact table record with the proper current values of the surrogate keys.

When all the fact table production keys have been replaced with surrogate keys, the fact record is ready to load. The keys in the fact table record have been chosen to be proper foreign keys to the respective dimension tables, and the fact table is guaranteed to have referential integrity for the dimension tables.

It is worthwhile, however, to be paranoid in this situation. All of the fancy administration we have described in this section sets up referential integrity before actual database loading. It is still very possible to destroy referential integrity by failing to load one or more of the dimension tables or by making other administrative mistakes, like deleting dimensional records when there are still fact records depending on the dimension key values.

Updated on 10-Feb-2022 11:09:30