What is Standardization in Machine Learning

Standardization is a crucial preprocessing technique in machine learning that ensures all features are on the same scale. This process transforms data to have a mean of 0 and a standard deviation of 1, making features comparable and improving model performance.

What is Standardization?

Standardization, also known as Z-score normalization, is a feature scaling technique that transforms data by subtracting the mean and dividing by the standard deviation. This process ensures that all features contribute equally to machine learning algorithms that are sensitive to feature scale.

Mathematical Formula

The standardization formula is ?

Z = (X - ?) / ?

Where:

  • Z the standardized value
  • X the original data point
  • ? the mean of the dataset
  • ? the standard deviation of the dataset

Manual Calculation Example

Let's standardize the dataset [3, 5, 7, 8, 9, 4] manually ?

import numpy as np

# Original data
data = [3, 5, 7, 8, 9, 4]
print("Original data:", data)

# Calculate mean and standard deviation
mean = np.mean(data)
std = np.std(data)
print(f"Mean: {mean}")
print(f"Standard deviation: {std:.2f}")

# Manual standardization
standardized = [(x - mean) / std for x in data]
print("Standardized data:", [round(z, 2) for z in standardized])

# Verify: mean should be ~0, std should be ~1
print(f"New mean: {np.mean(standardized):.2f}")
print(f"New std: {np.std(standardized):.2f}")
Original data: [3, 5, 7, 8, 9, 4]
Mean: 6.0
Standard deviation: 2.16
Standardized data: [-1.39, -0.46, 0.46, 0.93, 1.39, -0.93]
New mean: 0.00
New std: 1.00

Using StandardScaler from scikit-learn

For real-world applications, use scikit-learn's StandardScaler ?

from sklearn.preprocessing import StandardScaler
import numpy as np

# Create sample data matrix
X = np.array([[85, 72, 80], 
              [64, 35, 26], 
              [67, 48, 29], 
              [100, 11, 102], 
              [130, 14, 151]])

print("Original data:")
print(X)

# Create and fit StandardScaler
scaler = StandardScaler()
X_standardized = scaler.fit_transform(X)

print("\nStandardized data:")
print(X_standardized.round(2))

# Verify standardization
print(f"\nMeans: {X_standardized.mean(axis=0).round(2)}")
print(f"Standard deviations: {X_standardized.std(axis=0).round(2)}")
Original data:
[[ 85  72  80]
 [ 64  35  26]
 [ 67  48  29]
 [100  11 102]
 [130  14 151]]

Standardized data:
[[-0.17  1.59  0.05]
 [-1.04 -0.04 -1.1 ]
 [-0.92  0.53 -1.04]
 [ 0.45 -1.11  0.52]
 [ 1.69 -0.97  1.56]]

Means: [ 0. -0.  0.]
Standard deviations: [1. 1. 1.]

When to Use Standardization

Standardization is essential for algorithms that are sensitive to feature scale:

  • Support Vector Machines (SVM) relies on distance calculations
  • K-Means Clustering uses Euclidean distance
  • Principal Component Analysis (PCA) sensitive to variance
  • Neural Networks improves convergence speed
  • Logistic Regression with regularization

Standardization vs Normalization

Aspect Standardization Normalization
Formula (X - ?) / ? (X - min) / (max - min)
Result Range No fixed range [0, 1]
Mean 0 Depends on data
Standard Deviation 1 Depends on data

Conclusion

Standardization ensures all features contribute equally to machine learning models by transforming them to have zero mean and unit variance. Use StandardScaler for consistent preprocessing and improved model performance.

Updated on: 2026-03-27T09:12:36+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements