How to eliminate mean values from feature vector using scikit-learn library in Python?


Pre-processing data refers to cleaning of data, removing invalid data, noise, replacing data with relevant values and so on.

Data pre-processing basically refers to the task of gathering all the data (which is collected from various resources or a single resource) into a common format or into uniform datasets (depending on the type of data). The output of one step becomes the input to the next step and so on.

Mean values might have to be removed from input data to get specific result. Let us understand how it can be achieved using scikit-learn library.

Example

import numpy as np
from sklearn import preprocessing
input_data = np.array([
[34.78, 31.9, -65.5],
[-16.5, 2.45, -83.5],
[0.5, -87.98, 45.62],
[5.9, 2.38, -55.82]])
print("Mean value is : ", input_data.mean(axis=0))
print("Standard deviation value is : ", input_data.std(axis=0))
data_scaled = preprocessing.scale(input_data)
print("Mean value has been removed ", data_scaled.mean(axis=0))
print("Standard deviation has been removed ", data_scaled.std(axis=0))

Output

Mean value is : [ 6.17 -12.8125 -39.8 ]
Standard deviation value is : [18.4708067 45.03642047 50.30754615]
Mean value has been removed [-2.60208521e-18 -8.32667268e-17 -1.11022302e-16]
Standard deviation has been removed [1. 1. 1.]

Explanation

  • The required packages are imported.

  • The input data is generated using the Numpy library.

  • The mean and the standard deviation values are calculated.

  • They are displayed on the console.

  • The ‘data_scaled’ function is used to remove the mean and standard deviation values from the data.

  • This removed mean and standard deviation data is displayed on the console.

Updated on: 11-Dec-2020

267 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements