Python - How and where to apply Feature Scaling?

Feature scaling is a crucial data preprocessing step applied to independent variables or features. It normalizes data within a particular range, ensuring all features contribute equally to machine learning algorithms.

Why Feature Scaling is Important

Most datasets contain features with vastly different magnitudes, units, and ranges. For example, age (20-80) versus income (20,000-100,000). Machine learning algorithms that use Euclidean distance treat these differences literally ?

import numpy as np
from sklearn.preprocessing import StandardScaler

# Example: Age vs Income (unscaled)
data = np.array([[25, 50000], [30, 75000], [35, 100000]])
print("Original data:")
print("Age | Income")
for row in data:
    print(f"{row[0]:3d} | {row[1]:6d}")

# Calculate distances between points
dist1 = np.sqrt((30-25)**2 + (75000-50000)**2)
dist2 = np.sqrt((35-30)**2 + (100000-75000)**2)
print(f"\nDistance dominated by income: {dist1:.0f}, {dist2:.0f}")
Original data:
Age | Income
 25 |  50000
 30 |  75000
 35 | 100000

Distance dominated by income: 25000, 25000

The income feature dominates distance calculations, making age virtually irrelevant. Feature scaling solves this problem.

Feature Scaling Techniques

Standardization (Z-score Normalization)

Transforms features to have mean = 0 and standard deviation = 1 using the formula: x' = (x - ?) / ?

from sklearn.preprocessing import StandardScaler
import numpy as np

data = np.array([[25, 50000], [30, 75000], [35, 100000]])
scaler = StandardScaler()
standardized = scaler.fit_transform(data)

print("Standardized data:")
print("Age     | Income")
for row in standardized:
    print(f"{row[0]:7.2f} | {row[1]:7.2f}")
Standardized data:
Age     | Income
  -1.00 |   -1.00
   0.00 |    0.00
   1.00 |    1.00

Min-Max Scaling

Scales features to a fixed range [0,1] using: x' = (x - min(x)) / (max(x) - min(x))

from sklearn.preprocessing import MinMaxScaler

data = np.array([[25, 50000], [30, 75000], [35, 100000]])
scaler = MinMaxScaler()
minmax_scaled = scaler.fit_transform(data)

print("Min-Max scaled data:")
print("Age  | Income")
for row in minmax_scaled:
    print(f"{row[0]:.2f} | {row[1]:.2f}")
Min-Max scaled data:
Age  | Income
0.00 | 0.00
0.50 | 0.50
1.00 | 1.00

Robust Scaling

Uses median and interquartile range, making it robust to outliers: x' = (x - median) / IQR

from sklearn.preprocessing import RobustScaler

data = np.array([[25, 50000], [30, 75000], [35, 100000], [100, 200000]])  # Added outlier
scaler = RobustScaler()
robust_scaled = scaler.fit_transform(data)

print("Robust scaled data (with outlier):")
print("Age   | Income")
for row in robust_scaled:
    print(f"{row[0]:5.2f} | {row[1]:6.2f}")
Robust scaled data (with outlier):
Age   | Income
 -1.00 |  -1.00
  0.00 |   0.00
  1.00 |   1.00
 14.00 |  15.00

When to Apply Feature Scaling

Scale when algorithms use distance or assume normality:

Algorithm Type Scaling Required? Reason
K-Nearest Neighbors Yes Uses Euclidean distance
SVM Yes Distance-based optimization
Neural Networks Yes Gradient descent optimization
PCA Yes Variance-based feature selection
Decision Trees No Split-based, not distance-based
Random Forest No Tree-based ensemble

Practical Example

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification
import numpy as np

# Create sample data with different scales
X, y = make_classification(n_samples=100, n_features=2, n_redundant=0, 
                          n_informative=2, random_state=42)
X[:, 1] = X[:, 1] * 1000  # Scale second feature

# Without scaling
knn_unscaled = KNeighborsClassifier(n_neighbors=3)
knn_unscaled.fit(X, y)
accuracy_unscaled = knn_unscaled.score(X, y)

# With scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
knn_scaled = KNeighborsClassifier(n_neighbors=3)
knn_scaled.fit(X_scaled, y)
accuracy_scaled = knn_scaled.score(X_scaled, y)

print(f"Accuracy without scaling: {accuracy_unscaled:.3f}")
print(f"Accuracy with scaling: {accuracy_scaled:.3f}")
Accuracy without scaling: 0.840
Accuracy with scaling: 0.930

Conclusion

Feature scaling is essential for distance-based algorithms and gradient descent optimization. Use StandardScaler for normally distributed data, MinMaxScaler for bounded ranges, and RobustScaler when outliers are present. Always scale features when using KNN, SVM, neural networks, or PCA.

Updated on: 2026-03-25T09:15:27+05:30

333 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements