
- Python Basic Tutorial
- Python - Home
- Python - Overview
- Python - Environment Setup
- Python - Basic Syntax
- Python - Comments
- Python - Variables
- Python - Data Types
- Python - Operators
- Python - Decision Making
- Python - Loops
- Python - Numbers
- Python - Strings
- Python - Lists
- Python - Tuples
- Python - Dictionary
- Python - Date & Time
- Python - Functions
- Python - Modules
- Python - Files I/O
- Python - Exceptions
How can data be scaled using scikit-learn library in Python?
Feature scaling is an important step in the data pre-processing stage in building machine learning algorithms. It helps normalize the data to fall within a specific range.
At times, it also helps in increasing the speed at which the calculations are performed by the machine.
Why it is needed?
Data fed to the learning algorithm as input should remain consistent and structured. All features of the input data should be on a single scale to effectively predict the values. But in real-world, data is unstructured, and most of the times, not on the same scale.
This is when normalization comes into picture. It is one of the most important data-preparation processes. It helps in changing values of the columns of the input dataset to fall on a same scale.
Let us understand how Scikit learn library can be used to perform feature scaling in Python.
Example
import numpy as np from sklearn import preprocessing input_data = np.array( [[34.78, 31.9, -65.5], [-16.5, 2.45, -83.5], [0.5, -87.98, 45.62], [5.9, 2.38, -55.82]]) data_scaler_minmax = preprocessing.MinMaxScaler(feature_range=(0,1)) data_scaled_minmax = data_scaler_minmax.fit_transform(input_data) print ("\nThe scaled data is \n", data_scaled_minmax)
Output
The scaled data is [[1. 1. 0.1394052 ] [0. 0.75433767 0. ] [0.33151326 0. 1. ] [0.43681747 0.75375375 0.21437423]]
Explanation
The required packages are imported.
The input data is generated using the Numpy library.
The MinMaxScaler function present in the class ‘preprocessing ‘ is used to scale the data to fall in the range 0 and 1.
This way, any data in the array gets scaled down to a value between 0 and 1.
This scaled data is displayed on the console.
- Related Articles
- How can scikit learn library be used to preprocess data in Python?
- How can scikit-learn library be used to load data in Python?
- Explain how L1 Normalization can be implemented using scikit-learn library in Python?
- Explain how L2 Normalization can be implemented using scikit-learn library in Python?
- How can scikit learn library be used to upload and view an image in Python?
- How can scikit-learn library be used to get the resolution of an image in Python?
- How to binarize the data using Python Scikit-learn?
- What is hysteresis thresholding? How can it be achieved using scikit-learn in Python?
- Explain the basics of scikit-learn library in Python?
- How to eliminate mean values from feature vector using scikit-learn library in Python?
- Explain how scikit-learn library can be used to split the dataset for training and testing purposes in Python?
- How can data be represented visually using ‘seaborn’ library in Python?
- Learning Model Building in Scikit-learn: A Python Machine Learning Library
- How can a specific tint be added to grayscale images in scikit-learn in Python?
- Finding Euclidean distance using Scikit-Learn in Python
