# How to binarize the data using Python Scikit-learn?

Binarization is a preprocessing technique which is used when we need to convert the data into binary numbers i.e., when we need to binarize the data. The scikit-learn function named Sklearn.preprocessing.binarize() is used to binarize the data.

This binarize function is having threshold parameter, the feature values below or equal this threshold value is replaced by 0 and value above it is replaced by 1.

In this tutorial, we will learn to binarize data and sparse matrices using Scikit-learn (Sklearn) in Python.

## Example

Let’s see an example in which we preprocess a numpy array into binary numbers −

# Importing the necessary packages
import sklearn
import numpy as np
from sklearn import preprocessing
X = [[ 0.4, -1.8, 2.9],[ 2.5, 0.9, 0.3],[ 0., 1., -1.5],[ 0.1, 2.9, 5.9]]
Binarized_Data = preprocessing.Binarizer(threshold=0.5).transform(X)
print("\nThe Binarized data is:\n", Binarized_Data)


## Output

It will produce the following output −

The Binarized data is:
[[0. 0. 1.]
[1. 1. 0.]
[0. 1. 0.]
[0. 1. 1.]]


## How to Binarize Sparse Matrices?

Sparse matrix is comprised of mostly zero values, and they are distinct from so called dense matrices which comprise mostly non-zero values. Spare matrices are special because, to save space in memory, the zeros aren’t stored.

We can use Scikit-learn preprocessing.binarize() function to binarize the sparse matrices but the condition is that the threshold value cannot be less than zero.

### Example 1

Let’s see an example to understand it −

# Import necessary libraries
import sklearn
from scipy.sparse import coo
import numpy as np

# Create sparse matrix
sparse_matrix = coo.coo_matrix(np.random.binomial(1, .25, 50))

# Import sklearn preprocessing module
from sklearn import preprocessing
sparse_binarized = preprocessing.binarize(sparse_matrix, threshold=-1)


### Output

It will produce the error that the value of threshold cannot be less than 0 −

ValueError: Cannot binarize a sparse matrix with threshold < 0


### Example 2

Let’s see same example having threshold value more than zero −

# Import necessary libraries
import sklearn
from scipy.sparse import coo
import numpy as np

# Create sparse matrix
sparse_matrix = coo.coo_matrix(np.random.binomial(1, .25, 50))

# Import sklearn preprocessing module
from sklearn import preprocessing
sparse_binarized = preprocessing.binarize(sparse_matrix, threshold=0.8)
print(sparse_binarized)


### Output

It will produce the following output −

(0, 5) 1
(0, 6) 1
(0, 9) 1
(0, 15) 1
(0, 25) 1
(0, 27) 1
(0, 29) 1
(0, 30) 1
(0, 31) 1