What is Variable Transformation?

Data MiningDatabaseData Structure

A variable transformation defines a transformation that is used to some values of a variable. In other terms, for every object, the revolution is used to the value of the variable for that object. For instance, if only the significance of a variable is essential, then the values of the variable can be changed by creating the absolute value.

There are two types of variable transformations: simple functional transformations and normalization.

Simple Functions

A simple mathematical function is used to each value independently. If r is a variable, then examples of such transformations include xk,logx, ex,$\sqrt{x}$,$\frac{1}{x}$,sinx,or |x|. In statistics, variable transformations, particularly sqrt, log, and 1/x, are applied to transform record that does not have a Gaussian (normal) distribution into information that does. While this can be essential, some reasons can take precedence in data mining.

Consider the variable of interest is the several data bytes in a session and the several bytes range from 1 to 1 billion. This is a huge range, and it may be advantageous to compress it by using a log10 transformation. In this case, sessions that transferred 108 and 109 bytes would be more similar to each other than sessions that transferred 10 and 1000 bytes (9 - 8 = 1 versus 3 - 1 = 2).

Variable transformations should be applied with caution since they change the nature of the data. There can be issues if the feature of the transformation is not completely respected. For example, the transformation 1/x decreases the significance of values that are 1 or higher but increases the significance of values between 0 and 1.

Normalization or Standardization

Another common type of variable transformation is the standardization or normalization of a variable. The objective of standardization or normalization is to create a whole group of values that have a specific property. A common instance is that of "standardizing a variable" in statistics. If x is the mean (average) of the attribute values and sx, is their standard deviation, then the transformation x = (x –x)/ sx) creates a new variable that has a mean of 0 and a standard deviation of 1.

If different variables are to be combined in some way, then such a transformation is often necessary to avoid having a variable with large values dominate the results of the calculation.

The mean and standard deviation are strongly affected by outliers, so the above transformation is often modified. First, the mean is replaced by the median, i.e., the middle value. Second, the standard deviation is replaced by the absolute standard deviation. Specifically, if r is a variable, then the absolute standard deviation of r is given by $\mathrm{\sigma_{A}=\displaystyle\sum\limits_{i=1}^m |X_{i}-\mu|}$ where xi is the ith value of the variable, m is the number of objects, and μ is either the mean or median.

raja
Updated on 11-Feb-2022 11:50:41

Advertisements