- Trending Categories
- Data Structure
- Operating System
- C Programming
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What is Data Transformation?
In data transformation, the data are transformed or combined into forms suitable for mining. Data transformation can involve the following −
Smoothing − It can work to remove noise from the data. Such methods contain binning, regression, and clustering.
Aggregation − In aggregation, where summary or aggregation operations are applied to the data. For example, the daily sales data may be aggregated to compute monthly and annual total amounts. This phase is generally used in making a data cube for the analysis of the data at multiple granularities.
Generalization − In Generalization, where low-level or “primitive” (raw) data are restored by larger-level concepts through the use of concept hierarchies. For instance, categorical attributes, such as street, can be generalized to larger-level concepts, such as city or country. Similarly, values for numerical attributes, such as age, can be mapped to larger-level concepts, like youth, middle-aged, and senior.
Normalization − In normalization, where the attribute data are scaled to fall within a small specified range, such as −1.0 to 1.0, or 0.0 to 1.0.
Attribute construction − In attribute construction, where new attributes are developed and added from the given set of attributes to facilitate the mining process.
Smoothing is a form of data cleaning and was addressed in the data cleaning process where users specify transformations to correct data inconsistencies. Aggregation and generalization provide as forms of data reduction. An attribute is normalized by scaling its values so that they decline within a small specified order, including 0.0 to 1.0.
Normalization is especially helpful for classification algorithms containing neural networks, or distance measurements such as nearest-neighbor classification and clustering. If using the neural network backpropagation algorithm for classification mining, normalizing the input values for each attribute measured in the training tuples will help speed up the learning phase.
For distance-based methods, normalization helps prevent attributes with initially large ranges (e.g., income) from outweighing attributes with initially smaller ranges (e.g., binary attributes). There are many methods for data normalization which are as follows −
Min-max normalization − It implements a linear transformation on the original data. Suppose that minA and maxA are the minimum and maximum values of an attribute, A. Min-max normalization maps a value, v, of A to v’ in the range [new_minA , new_maxA ] by computing
Z-score normalization − In z-score normalization (or zero-mean normalization), the values for an attribute, A, are normalized based on the mean and standard deviation of A. A value, v, of A is normalized to v’ by computing
where A and σA are the mean and standard deviation, respectively, of attribute A. This method of normalization is useful when the actual minimum and maximum of attribute A are unknown, or when there are outliers that dominate the min-max normalization.
Decimal Scaling − Normalization by decimal scaling normalizes by changing the decimal point of values of attribute A. The number of decimal points moved based on the maximum absolute value of A. A value, v, of A is normalized to v′ by computing
Where j is the smallest integer such that Max (|v′|)<1.
- What is Variable Transformation?
- What are the services of data transformation?
- What is scale transformation in JavaFX?
- What is a transformation matrix in HTML5 canvas?
- Why is wavelet transformation useful for clustering?
- HTML5 Canvas Transformation
- How can Tensorflow be used with Estimators to perform data transformation?
- How to apply linear transformation to the input data in PyTorch?
- What is Data Dictionary
- What is Data Switching?
- What is Data Encoding?
- What is Data Dependency?
- What is Data Integrity?
- What is big data?
- What is Data Mining?