Article Categories

Selected Reading

How to eliminate mean values from feature vector using scikit-learn library in Python?

Python Server Side Programming Programming

Data preprocessing is essential for machine learning, involving cleaning data, removing noise, and standardizing features. Sometimes you need to eliminate mean values from feature vectors to center the data around zero, which helps algorithms perform better.

The scikit-learn library provides the preprocessing.scale() function to remove mean values and standardize features. This process is called standardization or z-score normalization.

Syntax

sklearn.preprocessing.scale(X, axis=0, with_mean=True, with_std=True)

Parameters

X ? Input array or matrix
axis ? Axis along which to compute (0 for columns, 1 for rows)
with_mean ? Boolean to center data by removing mean
with_std ? Boolean to scale data to unit variance

Example

Let's eliminate mean values from a feature vector using scikit-learn ?

import numpy as np
from sklearn import preprocessing

# Create sample input data
input_data = np.array([
    [34.78, 31.9, -65.5],
    [-16.5, 2.45, -83.5],
    [0.5, -87.98, 45.62],
    [5.9, 2.38, -55.82]
])

print("Original data:")
print(input_data)
print("\nMean values:", input_data.mean(axis=0))
print("Standard deviation:", input_data.std(axis=0))

# Scale the data (remove mean and standardize)
data_scaled = preprocessing.scale(input_data)

print("\nAfter scaling:")
print("Mean values:", data_scaled.mean(axis=0))
print("Standard deviation:", data_scaled.std(axis=0))

Original data:
[[ 34.78  31.9  -65.5 ]
 [-16.5    2.45 -83.5 ]
 [  0.5  -87.98  45.62]
 [  5.9    2.38 -55.82]]

Mean values: [ 6.17  -12.8125 -39.8   ]
Standard deviation: [18.4708067  45.03642047 50.30754615]

After scaling:
Mean values: [-2.60208521e-18 -8.32667268e-17 -1.11022302e-16]
Standard deviation: [1. 1. 1.]

How It Works

The preprocessing.scale() function performs two operations ?

Mean removal ? Subtracts the mean from each feature
Standardization ? Divides by standard deviation to get unit variance

The mathematical formula is: (x - mean) / std

Only Removing Mean (Without Standardization)

To remove only the mean without standardizing ?

import numpy as np
from sklearn import preprocessing

input_data = np.array([
    [34.78, 31.9, -65.5],
    [-16.5, 2.45, -83.5],
    [0.5, -87.98, 45.62],
    [5.9, 2.38, -55.82]
])

# Remove mean only (keep original standard deviation)
data_centered = preprocessing.scale(input_data, with_std=False)

print("Mean after centering:", data_centered.mean(axis=0))
print("Std after centering:", data_centered.std(axis=0))

Mean after centering: [-1.38777878e-17  0.00000000e+00  2.77555756e-17]
Std after centering: [18.4708067  45.03642047 50.30754615]

Conclusion

Use preprocessing.scale() to eliminate mean values and standardize features in scikit-learn. Set with_std=False to remove only the mean while preserving original variance.

AmitDiwan

Updated on: 2026-03-25T13:20:50+05:30

507 Views

Previous Next