Feature scaling is an important step in the data pre-processing stage in building machine learning algorithms. It helps normalize the data to fall within a specific range.
At times, it also helps in increasing the speed at which the calculations are performed by the machine.
Data fed to the learning algorithm as input should remain consistent and structured. All features of the input data should be on a single scale to effectively predict the values. But in real-world, data is unstructured, and most of the times, not on the same scale.
This is when normalization comes into picture. It is one of the most important data-preparation processes. It helps in changing values of the columns of the input dataset to fall on a same scale.
Let us understand how Scikit learn library can be used to perform feature scaling in Python.
import numpy as np from sklearn import preprocessing input_data = np.array( [[34.78, 31.9, -65.5], [-16.5, 2.45, -83.5], [0.5, -87.98, 45.62], [5.9, 2.38, -55.82]]) data_scaler_minmax = preprocessing.MinMaxScaler(feature_range=(0,1)) data_scaled_minmax = data_scaler_minmax.fit_transform(input_data) print ("\nThe scaled data is \n", data_scaled_minmax)
The scaled data is [[1. 1. 0.1394052 ] [0. 0.75433767 0. ] [0.33151326 0. 1. ] [0.43681747 0.75375375 0.21437423]]
The required packages are imported.
The input data is generated using the Numpy library.
The MinMaxScaler function present in the class ‘preprocessing ‘ is used to scale the data to fall in the range 0 and 1.
This way, any data in the array gets scaled down to a value between 0 and 1.
This scaled data is displayed on the console.