Data Warehousing - Delivery Process
The data warehouse are never static. It evolves as the business increases. The today's need may be different from the future needs.We must design the data warehouse to change constantly. The real problem is that business itself is not aware of its requirement for information in the future.As business evolves it's need also changes therefore the data warehuose must be designed to ride with these changes. Hence the data warehouse systems need to be flexible.
There should be a delivery process to deliver the data warehouse.But there are many issues in data warehouse projects that it is very difficult to complete the task and deliverables in the strict, ordered fashion demanded by waterfall method because the requirements are hardly fully understood. Hence when the requirements are completed only then the architectures designs, and build components can be completed.
The delivery method is a variant of the joint application development approach, adopted for delivery of data warehouse. We staged the data warehouse delivery process to minimize the risk. The approach that i will discuss does not reduce the overall delivery time-scales but ensures business benefits are delivered incrementally through the development process.
Note: The delivery process is broken into phases to reduce the project and delivery risk.
Following diagram Explain the Stages in delivery process:
Data warehouse are strategic investments, that require business process to generate the project benefits. IT Strategy is required to procure and retain funding for the project.
The objective of Business case is to know the projected business benefits that should be derived from using the data warehouse. These benefits may not be quantifiable but the projected benefits need to be clearly stated.. If the data warehouse does not have a clear business case then the business tend to suffer from the credibility problems at some stage during the delivery process.Therefore in data warehouse project we need to understand the business case for investment.
Education and Prototyping
The organization will experiment with the concept of data analysis and educate themselves on the value of data warehouse before determining that a data warehouse is prior solution. This is addressed by prototyping. This prototyping activity helps in understanding the feasibility and benefits of a data warehouse. The Prototyping activity on a small scale can further the educational process as long as:
The prototype address a defined technical objective.
The prototype can be thrown away after the feasibility concept has been shown.
The activity addresses a small subset of eventual data content if the data warehouse.
The activity timescale is non- critical.
Points to remember to produce an early release of a part of a data warehouse to deliver business benefits.
Identify the architecture that is capable of evolving.
Focus on the business requirements and technical blueprint phases.
Limit the scope of the first build phase to the minimum that delivers business benefits.
Understand the short term and medium term requirements of the data warehouse.
To provide the quality deliverables we should make sure that overall requirements are understood. The business requirements and the technical blueprint stages are required because of the following reasons:
If we understand the business requirements for both short and medium term then we can design a solution that satisfies the short term need.
This would be capable of growing to the full solution.
Things to determine in this stage are following.
The business rule to be applied on data.
The logical model for information within the data warehouse.
The query profiles for the immediate requirement.
The source systems that provide this data.
This phase need to deliver an overall architecture satisfying the long term requirements. This phase also deliver the components that must be implemented in a short term to derive any business benefit. The blueprint need to identify the followings.
The overall system architecture.
The data retention policy.
The backup and recovery strategy.
The server and data mart architecture.
The capacity plan for hardware and infrastructure.
The components of database design.
Building the version
In this stage the first production deliverable is produced.
This production deliverable smallest component of data warehouse.
This smallest component adds business benefit.
This is the phase where the remainder of the required history is loaded into the data warehouse. In this phase we do not add the new entities but additional physical tables would probably be created to store the increased data volumes.
Let's have an example, Suppose the build version phase has delivered a retail sales analysis data warehouse with 2 months worth of history. This information will allow the user to analyse only the recent trends and address the short term issues. The user can not identify the annual and seasonal trends. So the 2 years worth of sales history could be loaded from the archive to make user to analyse the sales trend yearly and seasonal. Now the 40GB data is extended to 400GB.
Note:The backup and recovery procedures may become complex therefore it is recommended that perform this activity within separate phase.
Ad hoc Query
In this phase we configure an ad hoc query tool.
This ad hoc query tool is used to operate the data warehouse.
These tools can generate the database query.
Note:It is recommended that not to use these access tolls when database is being substantially modified.
In this phase operational management processes are fully automated. These would include:
Transforming the data into a form suitable for analysis.
Monitoring query profiles and determining the appropriate aggregations to maintain system performance.
Extracting and loading the data from different source systems.
Generating aggregations from predefined definitions within the data warehouse.
Backing Up, restoring and archiving the data.
In this phase the data warehouse is extended to address a new set of business requirements. The scope can be extended in two ways:
By loading additional data into the data warehouse.
By introducing new data marts using the existing information.
Note:This phase should be performed separately since this phase involves substantial efforts and complexity.
From the perspective of delivery process the requirement are always changeable. They are not static.The delivery process must support this and allow these changes to be reflected within the system.
This issue is addressed by designing the data warehouse around the use of data within business processes, as opposed to the data requirements of existing queries.
The architecture is designed to change and grow to match the business needs,the process operates as a pseudo application development process, where the new requirements are continually fed into the development activities. The partial deliverables are produced.These partial deliverables are fed back to users and then reworked ensuring that overall system is continually updated to meet the business needs.