What are the Processes of Data Warehouse?

Data staging is a major process that includes the following subprocesses which are as follows −

Extracting − The extract step is the first phase of getting information into the data warehouse environment. Extracting defines reading and learning the source data, and copying the elements that are required to the data staging area for more work.

Transforming − Because the data is extracted into the data staging area, there are several possible transformation processes, are as follows −

  • It can be cleaning the data by correcting misspellings, resolving domain conflicts (including a city name that is inconsistent with a postal code), dealing with missing data components, and determining into standard formats.

  • It can be used to purge selected fields from the legacy records that are not beneficial for the data warehouse.

  • It can be joining data sources, by corresponding exactly on key values or by implementing fuzzy matches on non-key attributes, such as looking up textual same of legacy system codes

  • It can be creating surrogate keys for each dimension data to avoid dependence on legacy defined keys, where the surrogate key generation process implement referential integrity between the dimension tables and the fact tables.

  • It can be used to build aggregates for boosting the act of common queries.

Loading and Indexing − At the end of the transformation phase, the data is in the design of load data images. Loading in the data warehouse environment generally takes the form of reflecting the dimension tables and fact tables and featuring these tables to the size loading facilities of each recipient data mart.

Quality Assurance Checking − When each data mart has been loaded and indexed and provided with suitable aggregates, the final step before advertising is the quality assurance step. Quality assurance can be checked by functioning a comprehensive exception document over the complete set of newly loaded data.

All the reporting elements should be present, and all the counts and totals should be adequate. All reported values should be dependable with the time sequence of the same values that anticipate them. The exception document is constructed with the data mart’s end-user document writing facility.

Release/Publishing − When each data mart has been currently loaded and quality assured, the user community should be notified that the new record is ready. Publishing also connects the nature of any changes that have appeared in the basic dimensions and new assumptions that have been introduced into the measured or computed facts.

Querying − Querying is a wide term that encompasses all the activities of requesting information from a data mart, such as ad hoc querying by end-users, document writing, complex decision support applications, requests from models, and sophisticated data mining.

Updated on: 09-Feb-2022


Kickstart Your Career

Get certified by completing the course

Get Started