What are the various extraction methods in data warehouses?


The extraction method is immensely dependent on the source rule and also on the business requirement in the target data warehouse environment. The estimated bulk of the information to be extracted and the phase in the ETL procedure (original load or preservation of records) can also force the determination of how to extract, from a logical and a physical view. There are two types of extraction methods including Logical Extraction Methods and Physical Extraction Methods.

Logical Extraction Methods

There are two types of logical extraction are as follows −

  • Full Extraction − The data is extracted entirely from the source system. Because this extraction follows all the data directly accessible on the source system, there is no requirement to hold track of changes to the data source because of the final successful extraction.

    The source information will be supported and no additional logical data (such as timestamps) is essential on the source site. An example of a full extraction can be an export document of a distinct table or a remote SQL statement scanning the whole source table.

  • Incremental Extraction − There is the data that has transformed because of a clear event back in past will be extracted. This event can be the final time of extraction or a more complicated business event such as the final booking day of a fiscal duration.

    It can recognize this delta change there should be a possibility to recognize all the changed data because of this definite time event. This data can be supported by the source data itself including a software column, reflecting the final-changed timestamp, or a changing table where an appropriate additional structure keeps the mark of the changes besides the rising transactions. In general cases, utilizing the latter techniques defines inserting extraction logic to the source system.

Physical Extraction Methods

It is based on the chosen logical extraction method and the capacity and conditions on the source side, the extracted information can be physically extracted by two structures. The information can be extracted online from the source system or an offline mechanism. Such an offline mechanism can already occur or it can be created by an extraction routine.

There are the following methods of physical extraction are as follows −

  • Online Extraction − The data is extracted precisely from the source system itself. The extraction procedure can be linked directly to the source system to connect the source tables themselves or to a middle system that saves the information in a preconfigured aspect (for instance, snapshot logs or shift tables).

  • Offline Extraction − The data is not extracted precisely from the source system but is executed particularly outside the initial source system. The data has a current architecture (for instance, redo logs, archive logs, or mobile tablespaces) or was generated by an extraction routine.

Updated on: 23-Nov-2021

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements