Data Warehousing - Data Marting


Why to create Datamart

    The following are the reasons to create datamart:

  • To partition data in order to impose access control strategies.

  • To speed up the queries by reducing the volume of data to be scanned.

  • To segment data into different hardware platforms.

  • To structure data in a form suitable for a user access tool.

Note: Donot data mart for any other reason since the operation cost of data marting could be very high. Before data marting, make sure that data marting strategy is appropriate for your particular solution.

Steps to determine that data mart appears to fit the bill

Following steps need to be followed to make cost effective data marting:

  • Identify the Functional Splits

  • Identify User Access Tool Requirements

  • Identify Access Control Issues

Identify the Functional Splits

In this step we determine that whether the natural functional split is there in the organization. We look for departmental splits, and we determine whether the way in which department use information tends to be in isolation from the rest of the organization. Let's have an example...

suppose in a retail organization where the each merchant is accountable for maximizing the sales of a group of products. For this the information that is valuable is :

  • sales transaction on daily basis

  • sales forecast on weekly basis

  • stock position on daily basis

  • stock movements on daily basis

As the merchant is not interested in the products they are not dealing with, so the data marting is subset of the data dealing which the product group of interest. Following diagram shows data marting for different users.

Issues in determining the functional split:

  • The structure of the department may change.

  • The products might switch from one department to other.

  • The merchant could query the sales trend of other products to analyse what is happening to the sales.

These are issues that need to be taken into account while determining the functional split.

Note: we need to determine the business benefits and technical feasibility of using data mart.

Identify User Access Tool Requirements

For the user access tools that require the internal data structures we need data mart to support such tools. The data in such structures are outside the control of data warehouse but need to be populated and updated on regular basis.

There are some tools that populated directly from the source system but some can not. Therefore additional requirements outside the scope of the tool are needed to be identified for future.

Note: In order to ensure consistency of data across all access tools the data should not be directly populated from the data warehouse rather each tool must have its own data mart.

Identify Access Control Issues

There need to be privacy rules to ensure the data is accessed by the authorised users only. For example in data warehouse for retail baking institution ensure that all the accounts belong to the same legal entity. Privacy laws can force you to totally prevent access to information that is not owned by the specific bank.

Data mart allow us to build complete wall by physically separating data segments within the data warehouse. To avoid possible privacy problems the detailed data can be removed from the data warehouse.We can create data mart for each legal entity and load it via data warehouse, with detailed account data.

Designing Data Marts

The data marts should be designed as smaller version of starflake schema with in the data warehouse and should match to the database design of the data warehouse. This helps in maintaining control on database instances.

Designing Data Mart

The summaries are data marted in the same way as they would have been designed within the data warehouse. Summary tables helps to utilize all dimension data in the starflake schema.

Cost Of Data Marting

The following are the cost measures for Data marting:

  • Hardware and Software Cost

  • Network Access

  • Time Window Constraints

Hardware and Software Cost

Although the data marts are created on the same hardware even then they require some additional hardware and software.To handle the user queries there is need of additional processing power and disk storage. If the detailed data and the data mart exist within the data warehouse then we would face additional cost to store and manage replicated data.

Note: The data marting is more expensive than aggregations therefore it should be used as an additional strategy not as an alternative strategy.

Network Access

The data mart could be on different locations from the data warehouse so we should ensure that the LAN or WAN has the capacity to handle the data volumes being transferred within the data mart load process.

Time Window Constraints

The extent to which the data mart loading process will eat into the available time window will depend on the complexity of the transformations and the data volumes being shipped. Feasiblity of number of data mart depend on.

  • Network Capacity.

  • Time Window Available

  • Volume of data being transferred

  • Mechanisms being used to insert data into data mart