How can data be scaled using scikit-learn library in Python?


Feature scaling is an important step in the data pre-processing stage in building machine learning algorithms. It helps normalize the data to fall within a specific range.

At times, it also helps in increasing the speed at which the calculations are performed by the machine.

Why it is needed?

Data fed to the learning algorithm as input should remain consistent and structured. All features of the input data should be on a single scale to effectively predict the values. But in real-world, data is unstructured, and most of the times, not on the same scale.

This is when normalization comes into picture. It is one of the most important data-preparation processes. It helps in changing values of the columns of the input dataset to fall on a same scale.

Let us understand how Scikit learn library can be used to perform feature scaling in Python.

Example

import numpy as np
from sklearn import preprocessing
input_data = np.array(
[[34.78, 31.9, -65.5],
[-16.5, 2.45, -83.5],
[0.5, -87.98, 45.62],
[5.9, 2.38, -55.82]])
data_scaler_minmax = preprocessing.MinMaxScaler(feature_range=(0,1))
data_scaled_minmax = data_scaler_minmax.fit_transform(input_data)
print ("\nThe scaled data is \n", data_scaled_minmax)

Output

The scaled data is
[[1.  1. 0.1394052 ]
[0.  0.75433767 0. ]
[0.33151326 0. 1. ]
[0.43681747 0.75375375 0.21437423]]

Explanation

  • The required packages are imported.

  • The input data is generated using the Numpy library.

  • The MinMaxScaler function present in the class ‘preprocessing ‘ is used to scale the data to fall in the range 0 and 1.

  • This way, any data in the array gets scaled down to a value between 0 and 1.

  • This scaled data is displayed on the console.

Updated on: 11-Dec-2020

291 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements