What is the design of quality driven data warehouse?

A data warehouse defines a database that is maintained independently from an organization’s operational databases. Data warehouse systems enable the integration of several application systems. They support data processing by supporting a solid platform of consolidated, historical records for analysis.

A data warehouse can be viewed as a set of materialized views represented over remote base relations. When a query is formal, it is computed locally, using the materialized views, without accessing the initial data sources.

The data warehouse is an active entity that derives continuously over time. As time passes, new queries are required to be answered by them. Various queries can be answered using exclusively the materialized views. In general, though new views need to be inserted into the data warehouse.

After the basic online transaction processing (OLTP) infrastructure is in place in some organizations, not smallest through standardized enterprise resource planning tools including SAP/ R3, the target of interest is now broadening in minimum three directions −

  • A wider range of multimedia data sources internal and external to the organization.

  • A wider range of clients with diverse interest and capability profiles and situational parameters.

  • The conversion of the massive experiential data generated by transaction processing into knowledge applicable for organizational information and action.

A broad range of data flow logistics architectures is being proposed under labels including supply chain management and business-to-business e-commerce. In such architectures, databases can be treated as the short and medium-term intermediate stores of data whereas data warehouses serve for long-term memory, knowledge creation, and management.

A data warehouse system includes databases (source databases, materialized views in the data warehouse), data transport agents that ship records from one database to another, and a repository that saves metadata about the system and its expansion.

In this architecture heterogeneous data sources are first created applicable in a uniform method through extraction mechanisms known as wrappers, then mediators take on the services of data integration and conflict resolution. The separation among wrappers and mediators is a considered design decision, reflecting the separation between service wrappers and request brokers in middleware systems including CORBA.

The resulting standardized and integrated records are saved as materialized views in the data warehouse. These base views are generally slightly aggregated. It can customize them for several analyst users, data marts with more aggregated information about specific areas of interest are constructed as second-level caches which are then penetrated by data analysis tools ranging from query facilities through spreadsheet tools to full-fledged data mining systems.