- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What are the tools and utilities of a data warehouse?
Data Warehousing is a technique that is mainly used to collect and manage data from various sources to give the business a meaningful business insight. A data warehouse is specifically designed to support management decisions.
In simple terms, a data warehouse refers to a database that is maintained separately from an organization’s operational databases. Data warehouse systems enables the integration of multiple application systems. They provide data processing by supporting a solid platform of consolidated, historical information for analysis.
Data warehouses generalize and consolidate information in the multidimensional area. The construction of data warehouses includes data cleaning, data integration,and data transformation and can be considered as an essential pre-processing step for data mining. These tools and utilities involve the following functions −
Data extraction
Data cleaning − Data cleaning means cleaning the data by filling in the missing values, smoothing noisy data, identifying and removing outliers, and removing inconsistencies in the data.
Types of data cleaning
Missing Values − Missing values are filled with appropriate values. Following are the approaches to fill the values.
Noisy data − Noise is a random error or variance in a measured variable. Following are the smoothing techniques to handle noise which are as follows −
Binning − These techniques smooth out a sorted data value by advisory its “neighborhood,” namely, the values about the noisy data. The sorted values are assigned into multiple buckets or bins. Because binning methods consult the neighborhood of values, they implement local smoothing.
Regression − Data can be smoothed by fitting the records to a function, including with regression. Linear regression involves finding the “best” line to fit two attributes (or variables) so that one attribute can be used to predict the other. Multiple linear regression is a continuation of linear regression, where more than two attributes are included and the data are fit to a multidimensional surface.
Clustering − Clustering helps in identifying the outliers. Similar values are organized into clusters and those values which fall outside the cluster are called outliers.
Data Transformation − In data transformation, the data are transformed or consolidated into forms appropriate for mining. Data transformation can involve the following −
Smoothing − It can work to remove noise from the data. Such techniques include binning, regression, and clustering.
Aggregation − In aggregation, where summary or aggregation operations are applied to the data.
Generalization − In Generalization, where low-level or “primitive” (raw) data are restored by larger-level concepts through the use of concept hierarchies.
Load − It can sorts, summarizes, consolidates, computes views, checks integrity, and builds indices and partitions
Refresh − It can propagate the updates from the data sources to the data warehouse.
- Related Articles
- What are the various tools for Data Warehouse Quality?
- What are the components of a data warehouse?
- What are the Implementations of Data Warehouse?
- What are the Processes of Data Warehouse?
- What are the elements of a data warehouse system?
- What are the types of data warehouse quality?
- What are the elements of data warehouse environment?
- What are Data Warehouse Users?
- What are the security issues in a data warehouse?
- What are the tools of data mining?
- What is the design of data warehouse?
- What is the difference between Data Mining and Data Warehouse?
- What is Data Warehouse?
- What is the structure of the data warehouse?
- What is the process of data warehouse design?
