- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to eliminate mean values from feature vector using scikit-learn library in Python?
Pre-processing data refers to cleaning of data, removing invalid data, noise, replacing data with relevant values and so on.
Data pre-processing basically refers to the task of gathering all the data (which is collected from various resources or a single resource) into a common format or into uniform datasets (depending on the type of data). The output of one step becomes the input to the next step and so on.
Mean values might have to be removed from input data to get specific result. Let us understand how it can be achieved using scikit-learn library.
Example
import numpy as np from sklearn import preprocessing input_data = np.array([ [34.78, 31.9, -65.5], [-16.5, 2.45, -83.5], [0.5, -87.98, 45.62], [5.9, 2.38, -55.82]]) print("Mean value is : ", input_data.mean(axis=0)) print("Standard deviation value is : ", input_data.std(axis=0)) data_scaled = preprocessing.scale(input_data) print("Mean value has been removed ", data_scaled.mean(axis=0)) print("Standard deviation has been removed ", data_scaled.std(axis=0))
Output
Mean value is : [ 6.17 -12.8125 -39.8 ] Standard deviation value is : [18.4708067 45.03642047 50.30754615] Mean value has been removed [-2.60208521e-18 -8.32667268e-17 -1.11022302e-16] Standard deviation has been removed [1. 1. 1.]
Explanation
The required packages are imported.
The input data is generated using the Numpy library.
The mean and the standard deviation values are calculated.
They are displayed on the console.
The ‘data_scaled’ function is used to remove the mean and standard deviation values from the data.
This removed mean and standard deviation data is displayed on the console.