How to binarize the data using Python Scikit-learn?


Binarization is a preprocessing technique which is used when we need to convert the data into binary numbers i.e., when we need to binarize the data. The scikit-learn function named Sklearn.preprocessing.binarize() is used to binarize the data.

This binarize function is having threshold parameter, the feature values below or equal this threshold value is replaced by 0 and value above it is replaced by 1.

In this tutorial, we will learn to binarize data and sparse matrices using Scikit-learn (Sklearn) in Python.

Example

Let’s see an example in which we preprocess a numpy array into binary numbers −

# Importing the necessary packages import sklearn import numpy as np from sklearn import preprocessing X = [[ 0.4, -1.8, 2.9],[ 2.5, 0.9, 0.3],[ 0., 1., -1.5],[ 0.1, 2.9, 5.9]] Binarized_Data = preprocessing.Binarizer(threshold=0.5).transform(X) print("\nThe Binarized data is:\n", Binarized_Data)

Output

It will produce the following output −

The Binarized data is:
[[0. 0. 1.]
[1. 1. 0.]
[0. 1. 0.]
[0. 1. 1.]]

How to Binarize Sparse Matrices?

Sparse matrix is comprised of mostly zero values, and they are distinct from so called dense matrices which comprise mostly non-zero values. Spare matrices are special because, to save space in memory, the zeros aren’t stored.

We can use Scikit-learn preprocessing.binarize() function to binarize the sparse matrices but the condition is that the threshold value cannot be less than zero.

Example 1

Let’s see an example to understand it −

# Import necessary libraries import sklearn from scipy.sparse import coo import numpy as np # Create sparse matrix sparse_matrix = coo.coo_matrix(np.random.binomial(1, .25, 50)) # Import sklearn preprocessing module from sklearn import preprocessing sparse_binarized = preprocessing.binarize(sparse_matrix, threshold=-1)

Output

It will produce the error that the value of threshold cannot be less than 0 −

ValueError: Cannot binarize a sparse matrix with threshold < 0

Example 2

Let’s see same example having threshold value more than zero −

# Import necessary libraries import sklearn from scipy.sparse import coo import numpy as np # Create sparse matrix sparse_matrix = coo.coo_matrix(np.random.binomial(1, .25, 50)) # Import sklearn preprocessing module from sklearn import preprocessing sparse_binarized = preprocessing.binarize(sparse_matrix, threshold=0.8) print(sparse_binarized)

Output

It will produce the following output −

(0, 5) 1
(0, 6) 1
(0, 9) 1
(0, 15) 1
(0, 25) 1
(0, 27) 1
(0, 29) 1
(0, 30) 1
(0, 31) 1

Updated on: 04-Oct-2022

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements