
- Python Basic Tutorial
- Python - Home
- Python - Overview
- Python - Environment Setup
- Python - Basic Syntax
- Python - Comments
- Python - Variables
- Python - Data Types
- Python - Operators
- Python - Decision Making
- Python - Loops
- Python - Numbers
- Python - Strings
- Python - Lists
- Python - Tuples
- Python - Dictionary
- Python - Date & Time
- Python - Functions
- Python - Modules
- Python - Files I/O
- Python - Exceptions
How to binarize the data using Python Scikit-learn?
Binarization is a preprocessing technique which is used when we need to convert the data into binary numbers i.e., when we need to binarize the data. The scikit-learn function named Sklearn.preprocessing.binarize() is used to binarize the data.
This binarize function is having threshold parameter, the feature values below or equal this threshold value is replaced by 0 and value above it is replaced by 1.
In this tutorial, we will learn to binarize data and sparse matrices using Scikit-learn (Sklearn) in Python.
Example
Let’s see an example in which we preprocess a numpy array into binary numbers −
# Importing the necessary packages import sklearn import numpy as np from sklearn import preprocessing X = [[ 0.4, -1.8, 2.9],[ 2.5, 0.9, 0.3],[ 0., 1., -1.5],[ 0.1, 2.9, 5.9]] Binarized_Data = preprocessing.Binarizer(threshold=0.5).transform(X) print("\nThe Binarized data is:\n", Binarized_Data)
Output
It will produce the following output −
The Binarized data is: [[0. 0. 1.] [1. 1. 0.] [0. 1. 0.] [0. 1. 1.]]
How to Binarize Sparse Matrices?
Sparse matrix is comprised of mostly zero values, and they are distinct from so called dense matrices which comprise mostly non-zero values. Spare matrices are special because, to save space in memory, the zeros aren’t stored.
We can use Scikit-learn preprocessing.binarize() function to binarize the sparse matrices but the condition is that the threshold value cannot be less than zero.
Example 1
Let’s see an example to understand it −
# Import necessary libraries import sklearn from scipy.sparse import coo import numpy as np # Create sparse matrix sparse_matrix = coo.coo_matrix(np.random.binomial(1, .25, 50)) # Import sklearn preprocessing module from sklearn import preprocessing sparse_binarized = preprocessing.binarize(sparse_matrix, threshold=-1)
Output
It will produce the error that the value of threshold cannot be less than 0 −
ValueError: Cannot binarize a sparse matrix with threshold < 0
Example 2
Let’s see same example having threshold value more than zero −
# Import necessary libraries import sklearn from scipy.sparse import coo import numpy as np # Create sparse matrix sparse_matrix = coo.coo_matrix(np.random.binomial(1, .25, 50)) # Import sklearn preprocessing module from sklearn import preprocessing sparse_binarized = preprocessing.binarize(sparse_matrix, threshold=0.8) print(sparse_binarized)
Output
It will produce the following output −
(0, 5) 1 (0, 6) 1 (0, 9) 1 (0, 15) 1 (0, 25) 1 (0, 27) 1 (0, 29) 1 (0, 30) 1 (0, 31) 1
- Related Articles
- How can data be scaled using scikit-learn library in Python?
- How to implement Random Projection using Python Scikit-learn?
- How to perform dimensionality reduction using Python Scikit-learn?
- How to create a sample dataset using Python Scikit-learn?
- How to generate random regression problems using Python Scikit-learn?
- How to build Naive Bayes classifiers using Python Scikit-learn?
- How can scikit learn library be used to preprocess data in Python?
- How can scikit-learn library be used to load data in Python?
- How to generate and plot classification dataset using Python Scikit-learn?
- How to create a random forest classifier using Python Scikit-learn?
- Finding Euclidean distance using Scikit-Learn in Python
- How to find contours of an image using scikit-learn in Python?
- How to generate a symmetric positive-definite matrix using Python Scikit-Learn?
- How to get dictionary-like objects from dataset using Python Scikit-learn?
- How to view the pixel values of an image using scikit-learn in Python?
