Article Categories

Selected Reading

Python - How and where to apply Feature Scaling?

Python Server Side Programming Programming

Feature scaling is a crucial data preprocessing step applied to independent variables or features. It normalizes data within a particular range, ensuring all features contribute equally to machine learning algorithms.

Why Feature Scaling is Important

Most datasets contain features with vastly different magnitudes, units, and ranges. For example, age (20-80) versus income (20,000-100,000). Machine learning algorithms that use Euclidean distance treat these differences literally ?

import numpy as np
from sklearn.preprocessing import StandardScaler

# Example: Age vs Income (unscaled)
data = np.array([[25, 50000], [30, 75000], [35, 100000]])
print("Original data:")
print("Age | Income")
for row in data:
    print(f"{row[0]:3d} | {row[1]:6d}")

# Calculate distances between points
dist1 = np.sqrt((30-25)**2 + (75000-50000)**2)
dist2 = np.sqrt((35-30)**2 + (100000-75000)**2)
print(f"\nDistance dominated by income: {dist1:.0f}, {dist2:.0f}")

Original data:
Age | Income
 25 |  50000
 30 |  75000
 35 | 100000

Distance dominated by income: 25000, 25000

The income feature dominates distance calculations, making age virtually irrelevant. Feature scaling solves this problem.

Feature Scaling Techniques

Standardization (Z-score Normalization)

Transforms features to have mean = 0 and standard deviation = 1 using the formula: x' = (x - ?) / ?

from sklearn.preprocessing import StandardScaler
import numpy as np

data = np.array([[25, 50000], [30, 75000], [35, 100000]])
scaler = StandardScaler()
standardized = scaler.fit_transform(data)

print("Standardized data:")
print("Age     | Income")
for row in standardized:
    print(f"{row[0]:7.2f} | {row[1]:7.2f}")

Standardized data:
Age     | Income
  -1.00 |   -1.00
   0.00 |    0.00
   1.00 |    1.00

Min-Max Scaling

Scales features to a fixed range [0,1] using: x' = (x - min(x)) / (max(x) - min(x))

from sklearn.preprocessing import MinMaxScaler

data = np.array([[25, 50000], [30, 75000], [35, 100000]])
scaler = MinMaxScaler()
minmax_scaled = scaler.fit_transform(data)

print("Min-Max scaled data:")
print("Age  | Income")
for row in minmax_scaled:
    print(f"{row[0]:.2f} | {row[1]:.2f}")

Min-Max scaled data:
Age  | Income
0.00 | 0.00
0.50 | 0.50
1.00 | 1.00

Robust Scaling

Uses median and interquartile range, making it robust to outliers: x' = (x - median) / IQR

from sklearn.preprocessing import RobustScaler

data = np.array([[25, 50000], [30, 75000], [35, 100000], [100, 200000]])  # Added outlier
scaler = RobustScaler()
robust_scaled = scaler.fit_transform(data)

print("Robust scaled data (with outlier):")
print("Age   | Income")
for row in robust_scaled:
    print(f"{row[0]:5.2f} | {row[1]:6.2f}")

Robust scaled data (with outlier):
Age   | Income
 -1.00 |  -1.00
  0.00 |   0.00
  1.00 |   1.00
 14.00 |  15.00

When to Apply Feature Scaling

Scale when algorithms use distance or assume normality:

Algorithm Type	Scaling Required?	Reason
K-Nearest Neighbors	Yes	Uses Euclidean distance
SVM	Yes	Distance-based optimization
Neural Networks	Yes	Gradient descent optimization
PCA	Yes	Variance-based feature selection
Decision Trees	No	Split-based, not distance-based
Random Forest	No	Tree-based ensemble

Practical Example

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification
import numpy as np

# Create sample data with different scales
X, y = make_classification(n_samples=100, n_features=2, n_redundant=0, 
                          n_informative=2, random_state=42)
X[:, 1] = X[:, 1] * 1000  # Scale second feature

# Without scaling
knn_unscaled = KNeighborsClassifier(n_neighbors=3)
knn_unscaled.fit(X, y)
accuracy_unscaled = knn_unscaled.score(X, y)

# With scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
knn_scaled = KNeighborsClassifier(n_neighbors=3)
knn_scaled.fit(X_scaled, y)
accuracy_scaled = knn_scaled.score(X_scaled, y)

print(f"Accuracy without scaling: {accuracy_unscaled:.3f}")
print(f"Accuracy with scaling: {accuracy_scaled:.3f}")

Accuracy without scaling: 0.840
Accuracy with scaling: 0.930

Conclusion

Feature scaling is essential for distance-based algorithms and gradient descent optimization. Use StandardScaler for normally distributed data, MinMaxScaler for bounded ranges, and RobustScaler when outliers are present. Always scale features when using KNN, SVM, neural networks, or PCA.

Nizamuddin Siddiqui

Updated on: 2026-03-25T09:15:27+05:30

408 Views

Previous Next