What is the process of data warehouse design?

A data warehouse can be built using three approaches −

  • A top-down approach

  • A bottom-up approach

  • A combination of both approaches

The top-down approach starts with the complete design and planning. It is helpful in cases where the technology is sophisticated and familiar, and where the business issues that must be solved are clear and well-understood.

The bottom-up approach starts with experiments and prototypes. This is beneficial in the beginning phase of business modeling and technology development. It enables an organisation to move forward at considerably less expense and to compute the advantage of the technology before creating significant commitments.

In the combined approach, an organisation can exploit the planned and strategic features of the top-down approach while retaining the rapid execution and opportunistic software of the bottom-up approach.

In general, the warehouse design process consists of the following steps −

  • It can choose a business process to model, e.g., orders, invoices, shipments, inventory, account administration, sales, and the general ledger. If the business process is organisational and involves multiple, complex object collections, a data warehouse model should be followed. But, if the process is departmental and focuses on the analysis of one type of business process, a data mart model must be selected.

  • It can choose the grain of the business process. The grain is the fundamental, atomic level of data to be represented in the fact table for this process, e.g., individual transactions, individual daily snapshots, etc.

  • It can choose the dimensions that will apply to each fact table record. Typical dimensions are time, item, customer, supplier, warehouse, transaction type, and status.

  • It can choose the measures that will populate each fact table record. Typical measures are numeric additive quantities like dollars-sold and units-sold.

Since a data warehouse is designed and constructed, the original deployment of the warehouse contains initial installation, rollout planning, training, and orientation. Platform updates and maintenance should also be treated.

Data warehouse administration will include data refreshment, data source synchronization, planning for disaster recovery, managing access control and security, managing data growth, managing database performance, and data warehouse enhancement and extension.

Scope management contains controlling the number and range of queries, dimensions, and documents; limiting the size of the data warehouse; or limiting the schedule, budget, or resources.

There are various kinds of data warehouse design tools are available. Data warehouse development tools support functions to define and edit metadata repository contents (including schemas, scripts, or rules), answer queries, output reports, and ship metadata to and from relational database system catalogs.

Planning and analysis tools study the impact of schema changes and refresh performance when changing refresh rates or time windows.