- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

# Python - How and where to apply Feature Scaling?

It is a step of Data pre-processing which is applied to independent variables or features of data. It basically helps to normalise the data within a particular range.

## Why scaling?

Most of the times, your dataset will contain features highly varying in magnitudes, units and range. But since, most of the machine learning algorithms use Euclidian distance between two data points in their computations, this is a problem.

If left alone, these algorithms only take in the magnitude of features neglecting the units. The results would vary greatly between different units, 5kg and 5000gms.

The features with high magnitudes will weigh in a lot more in the distance calculations than features with low magnitudes.

To supress this effect, we need to bring all features to the same level of magnitudes. This can be achieved by scaling.

## How to scale features?

**Standardisation**− Standardisation replaces the values by their Z scores.- $$x^{\prime}=\frac{x\:-\:\bar{x}}{\sigma}$$This redistributes the features with their mean μ = 0 and standard deviation σ =1 . sklearn.preprocessing.scale helps us implementing standardisation in python.
**Mean Normalisation**−- $$x^{\prime}=\frac{x\:-\:mean(x)}{\max(x)\:-\:\min(x)}$$
This distribution will have values between

**-1 and 1**with μ=0.**Standardisation**and**Mean Normalization**can be used for algorithms that assumes zero centric data like**Principal Component Analysis(PCA)**. **Min-Max Scaling**- $$x^{\prime}=\frac{x\:-\:\min(x)}{\max(x)\:-\:\min(x)}$$
This scaling brings the value between 0 and 1.

**Unit Vector**−- $$x^{\prime}=\frac{x}{\lVert\:x\:\rVert}$$
Scaling is done considering the whole feature vector to be of unit length.

**Min-Max Scaling**and**Unit Vector**techniques produces values of range [0,1]. When dealing with features with hard boundaries this is quite useful. For example, when dealing with image data, the colors can range from only 0 to 255.

## When to scale?

Rule of thumb to follow here is any algorithm that computes distance or assumes normality, scale your features.

Some examples of algorithms where feature scaling matters are −

k-nearest neighbors with an Euclidean distance measure is sensitive to magnitudes and hence should be scaled for all features to weigh in equally.

Scaling is critical, while performing Principal Component Analysis(PCA). PCA tries to get the features with maximum variance and the variance is high for high magnitude features. This skews the PCA towards high magnitude features.

We can speed up gradient descent by scaling. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.

Tree based models are not distance based models and can handle varying ranges of features. Hence, Scaling is not required while modelling trees.

Algorithms like Linear Discriminant Analysis(LDA), Naive Bayes are by design equipped to handle this and gives weights to the features accordingly. Performing a features scaling in these algorithms may not have much effect.

- Related Questions & Answers
- Signals and Systems: Amplitude Scaling of Signals
- How to apply EXTRACT() function with WHERE Clause on the dates stored in MySQL table?
- How can Tensorflow be used to define feature columns in Python?
- How to create and apply CSS to JavaScript Alert box?
- Signals and Systems – Time Scaling of Signals
- How can Tensorflow be used to create a feature extractor using Python?
- How to disable uniform scaling in canvas using FabricJS?
- How do I apply some function to a Python meshgrid?
- Time Scaling and Frequency Shifting Properties of Laplace Transform
- Program to apply Russian Peasant Multiplication in Python
- How to enable centered scaling on a canvas using FabricJS?
- How to disable uniform scaling on a canvas using FabricJS?
- How to disable the centered scaling of Ellipse using FabricJS?
- How to lock the horizontal scaling of Ellipse using FabricJS?
- How to lock the vertical scaling of Ellipse using FabricJS?