How can data be scaled using scikit-learn library in Python?

Feature scaling is an important step in the data pre-processing stage when building machine learning algorithms. It helps normalize the data to fall within a specific range, which ensures all features contribute equally to the model's predictions.

At times, it also helps in increasing the speed at which calculations are performed by the machine learning algorithms.

Why Feature Scaling is Needed?

Data fed to learning algorithms should remain consistent and structured. All features of the input data should be on a similar scale to effectively predict values. However, in real-world scenarios, data is often unstructured and features have different scales.

For example, age might range from 0-100, while income could range from 0-100,000. Without scaling, the income feature would dominate the model simply due to its larger numeric range.

This is when normalization comes into picture. It is one of the most important data-preparation processes that transforms feature values to fall on the same scale.

MinMaxScaler Example

Let's see how scikit-learn's MinMaxScaler can be used to scale features between 0 and 1 ?

import numpy as np
from sklearn import preprocessing

input_data = np.array([
    [34.78, 31.9, -65.5],
    [-16.5, 2.45, -83.5],
    [0.5, -87.98, 45.62],
    [5.9, 2.38, -55.82]
])

data_scaler_minmax = preprocessing.MinMaxScaler(feature_range=(0,1))
data_scaled_minmax = data_scaler_minmax.fit_transform(input_data)
print("The scaled data is:")
print(data_scaled_minmax)
The scaled data is:
[[1.         1.         0.1394052 ]
 [0.         0.75433767 0.        ]
 [0.33151326 0.         1.        ]
 [0.43681747 0.75375375 0.21437423]]

StandardScaler Example

Another common scaling method is StandardScaler, which standardizes features by removing the mean and scaling to unit variance ?

import numpy as np
from sklearn import preprocessing

input_data = np.array([
    [34.78, 31.9, -65.5],
    [-16.5, 2.45, -83.5],
    [0.5, -87.98, 45.62],
    [5.9, 2.38, -55.82]
])

data_scaler_standard = preprocessing.StandardScaler()
data_scaled_standard = data_scaler_standard.fit_transform(input_data)
print("The standardized data is:")
print(data_scaled_standard)
The standardized data is:
[[ 1.26765207  1.40754641 -0.39878479]
 [-1.07750831  0.08847073 -1.19968102]
 [-0.14525707 -1.49094695  1.59846581]
 [-0.04488669 -0.00507019  0.        ]]

Comparison of Scaling Methods

Method Range Formula Best For
MinMaxScaler 0 to 1 (X - X_min) / (X_max - X_min) When you know the bounds
StandardScaler Mean=0, Std=1 (X - mean) / std When data is normally distributed

How It Works

  • The MinMaxScaler transforms features by scaling each feature to the range [0,1] using the minimum and maximum values.

  • The StandardScaler transforms features by removing the mean and scaling to unit variance (standard normal distribution).

  • Both methods use fit_transform() to compute the scaling parameters and apply the transformation in one step.

  • The scaled data maintains the relative relationships between data points while ensuring all features contribute equally to the model.

Conclusion

Feature scaling is essential for machine learning algorithms to perform optimally. Use MinMaxScaler when you need bounded ranges, and StandardScaler when your data follows a normal distribution. Both methods ensure features contribute equally to model predictions.

Updated on: 2026-03-25T13:21:13+05:30

463 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements