ETL - Introduction
ETL stands for Extract, Transform and Load. An ETL tool extracts the data from different RDBMS source systems, transforms the data like applying calculations, concatenate, etc. and then load the data to Data Warehouse system. The data is loaded in the DW system in the form of dimension and fact tables.
A staging area is required during ETL load. There are various reasons why staging area is required.
The source systems are only available for specific period of time to extract data. This period of time is less than the total data-load time. Therefore, staging area allows you to extract the data from the source system and keeps it in the staging area before the time slot ends.
Staging area is required when you want to get the data from multiple data sources together or if you want to join two or more systems together. For example, you will not be able to perform a SQL query joining two tables from two physically different databases.
Data extractions’ time slot for different systems vary as per the time zone and operational hours.
Data extracted from source systems can be used in multiple data warehouse system, Operation Data stores, etc.
ETL allows you to perform complex transformations and requires extra area to store the data.
In data transformation, you apply a set of functions on extracted data to load it into the target system. Data, which does not require any transformation is known as direct move or pass through data.
You can apply different transformations on extracted data from the source system. For example, you can perform customized calculations. If you want sum-of-sales revenue and this is not in database, you can apply the SUM formula during transformation and load the data.
For example, if you have the first name and the last name in a table in different columns, you can use concatenate before loading.
During Load phase, data is loaded into the end-target system and it can be a flat file or a Data Warehouse system.