Data Warehousing - Security
The objective data warehouse is to allow large amount of data to be easily accessible by the users. Hence allowing user to extract the information about the business as a whole. But we know that there could be some security restrictions applied on the data which can prove an obstacle for accessing the information. If the analyst has the restricted view of data then it is impossible to capture a complete picture of the trends within the business.
The data from each analyst can be summarised and passed onto management where the different summarise can be created. As the aggregations of summaries cannot be same as that of aggregation as a whole so It is possible to miss some information trends in the data unless someone is analysing the data as a whole.
Adding the security will affect the performance of the data warehouse, therefore it is worth determining the security requirements early as possible. Adding the security after the data warehouse has gone live, is very difficult.
During the design phase of data warehouse we should keep in mind that what data sources may be added later and what would be the impact of adding those data sources. We should consider the following possibilities during the design phase.
Whether the new data sources will require new security and/or audit restrictions to be implemented?
Whether the new users added who have restricted access to data that is already generally available?
This situation arises when the future users and the data sources are not well known. In such a situation we need to use the knowledge of business and the objective of data warehouse to know likely requirements.
Factor to Consider for Security requirements
The following are the parts that are affected by the security hence it is worth consider these factors.
We need to classify the data first and then the users by what data they can access.In other word the users are classified according to the data, they can access.
The following are the two approaches that can be used to classify the data:
The data can be classified according to its sensitivity. The highly sensitive data is classified as highly restricted and less sensitive data is classified as less restrictive.
The data can also be classified according to the job function. This restriction allows only the specific users to view particular data. In this we restrict the users to view only that that in which they are interested and are responsible for.
There are some issues in the second approach. To understand let's have an example, suppose you are building the data warehouse for a bank. suppose further that data being stored in the data warehouse is the transaction data for all the accounts. The question here is who is allowed to see the transaction data. The solution lies in classifying the data according to the function.
The users can be classified as per the hierarchy of users in an organisation i.e. users can be classified by department, section, group, and so on.
The user can also be classified according to their role, with people grouped across departments based on their role.
The following are the approaches that can be used to classify the users.
Classification on basis of Department
Let's have an example of a data warehouse where the users are from sales and marketing department. we can design the security by topdown company view, with access centered around the different departments. But they could be some restrictions on users at different level. This structure is shown in the following diagram.
But if each department accesses the different data then we should design the security access for each department separately. This can be achieved by the departmental data marts. Since these data marts are separated from the data warehouse hence we can enforce the separate security restrictions on each data mart. This approach is shown in the following figure.
Classification on basis of Role
If the data is generally available to all the departments.The it is worth to follow the role access hierarchy. In other words if the data is generally accessed by all the departments the apply the security restrictions as per the role of the user. The role accesshierarchy is shown in the following figure.
The auditing is a subset of security. The auditing is a costly activity therefore it is worth understanding the audit requirements and reason for each audit requirement. The auditing can cause the heavy overheads on the system. To complete auditing in time we require the more hardware therefore it is recommended that where possible, auditing should be switch off. Audit requirements can be categorized into the following:
Note: For each of the above mentioned categories it is necessary to audit success, failure or both. From the perspective of security reasons the auditing of failures are very important. The auditing of failure are important because they can highlight the unauthorised or fraudulent access.
The Network security is as important as other securities. We can not ignore the network security requirement. We need to consider the following issues.
Is it necessary to encrypt data before transferring it to the data warehouse machine?
Are there restrictions on which network routes the data can take?
These restrictions need to be considered carefully. Following are the points to remember.
The process of encryption and decryption will increase the overheads.It would require more processing power and processing time.
The cost of encryption can be high if the system is already a loaded system because the encryption is borne by the source system.
There exist potential security implications while moving the data. Suppose we need to transfer some restricted data as a flat file to be loaded. When the data is loaded into the data warehouse the following questions are raised?
Where is the flat file stored?
Who has access to that disk space?
If we talk about the backup of these flat files the following questions are raised?
Do you backup encrypted or decrypted versions?
Do these backup needs to be made to special tapes that are stored separately?
Who has access to these tapes?
Some other form of data movement like query result sets also need to be considered. The question here are raised when creating the temporary table are as follows.
Where is that temporary table to be held?
How do you make such table visible?
We should avoid the accidental flouting of security restrictions. If a user with access to the restricted data can generate accessible temporary tables, data can be made visible to nonauthorized users. We can overcome it by having separate temporary area for users with access to restricted data.
The audit and security requirements need to be properly documented. This will be treated as part of justification. This document can contain all the information gathered on the following.
Data movement and storage requirements
All auditable actions
Impact of Security on Design
The security affects the application code and the development timescales. The Security affects the following.
The security affect the overall application development and it also affect the design of the important components of the data warehouse such as load manager, warehouse manager and the query manager. The load manager may require checking code to filter record and place them in different locations. The more transformation rule may also be required to hide certain data . Also there may be requirement of extra metadata to handle any extra objects.
To create and maintain the extra vies the warehouse manager may require extra code to enforce the security. There may be the requirement of the extra checks coded into the data warehouse to prevent it from being fooled into moving data into location where it should not be available. The query manager require the changes to handle any access restrictions. The query manager will need to be aware of all extra views and aggregations.
The database layout is also affected because when the security is added there is increase in number of views and tables. Adding security adds the size to the database and hence increase the complexity of the database design and management. it will also add complexity to the backup management and recovery plan.
The testing of the data warehouse is very complex and a lengthy process. Adding security to the data warehouse also affect the testing time complexity. It affects the testing in the following two ways.
It will increase the time required for integration and system testing.
There is added functionality to be tested which will cause increase in the size of the testing suite.