Pre-processing data refers to cleaning of data, removing invalid data, noise, replacing data with relevant values and so on.
Data pre-processing basically refers to the task of gathering all the data (which is collected from various resources or a single resource) into a common format or into uniform datasets (depending on the type of data). The output of one step becomes the input to the next step and so on.
Mean values might have to be removed from input data to get specific result. Let us understand how it can be achieved using scikit-learn library.
import numpy as np from sklearn import preprocessing input_data = np.array([ [34.78, 31.9, -65.5], [-16.5, 2.45, -83.5], [0.5, -87.98, 45.62], [5.9, 2.38, -55.82]]) print("Mean value is : ", input_data.mean(axis=0)) print("Standard deviation value is : ", input_data.std(axis=0)) data_scaled = preprocessing.scale(input_data) print("Mean value has been removed ", data_scaled.mean(axis=0)) print("Standard deviation has been removed ", data_scaled.std(axis=0))
Mean value is : [ 6.17 -12.8125 -39.8 ] Standard deviation value is : [18.4708067 45.03642047 50.30754615] Mean value has been removed [-2.60208521e-18 -8.32667268e-17 -1.11022302e-16] Standard deviation has been removed [1. 1. 1.]
The required packages are imported.
The input data is generated using the Numpy library.
The mean and the standard deviation values are calculated.
They are displayed on the console.
The ‘data_scaled’ function is used to remove the mean and standard deviation values from the data.
This removed mean and standard deviation data is displayed on the console.