- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to binarize the data using Python Scikit-learn?
Binarization is a preprocessing technique which is used when we need to convert the data into binary numbers i.e., when we need to binarize the data. The scikit-learn function named Sklearn.preprocessing.binarize() is used to binarize the data.
This binarize function is having threshold parameter, the feature values below or equal this threshold value is replaced by 0 and value above it is replaced by 1.
In this tutorial, we will learn to binarize data and sparse matrices using Scikit-learn (Sklearn) in Python.
Example
Let’s see an example in which we preprocess a numpy array into binary numbers −
# Importing the necessary packages import sklearn import numpy as np from sklearn import preprocessing X = [[ 0.4, -1.8, 2.9],[ 2.5, 0.9, 0.3],[ 0., 1., -1.5],[ 0.1, 2.9, 5.9]] Binarized_Data = preprocessing.Binarizer(threshold=0.5).transform(X) print("\nThe Binarized data is:\n", Binarized_Data)
Output
It will produce the following output −
The Binarized data is: [[0. 0. 1.] [1. 1. 0.] [0. 1. 0.] [0. 1. 1.]]
How to Binarize Sparse Matrices?
Sparse matrix is comprised of mostly zero values, and they are distinct from so called dense matrices which comprise mostly non-zero values. Spare matrices are special because, to save space in memory, the zeros aren’t stored.
We can use Scikit-learn preprocessing.binarize() function to binarize the sparse matrices but the condition is that the threshold value cannot be less than zero.
Example 1
Let’s see an example to understand it −
# Import necessary libraries import sklearn from scipy.sparse import coo import numpy as np # Create sparse matrix sparse_matrix = coo.coo_matrix(np.random.binomial(1, .25, 50)) # Import sklearn preprocessing module from sklearn import preprocessing sparse_binarized = preprocessing.binarize(sparse_matrix, threshold=-1)
Output
It will produce the error that the value of threshold cannot be less than 0 −
ValueError: Cannot binarize a sparse matrix with threshold < 0
Example 2
Let’s see same example having threshold value more than zero −
# Import necessary libraries import sklearn from scipy.sparse import coo import numpy as np # Create sparse matrix sparse_matrix = coo.coo_matrix(np.random.binomial(1, .25, 50)) # Import sklearn preprocessing module from sklearn import preprocessing sparse_binarized = preprocessing.binarize(sparse_matrix, threshold=0.8) print(sparse_binarized)
Output
It will produce the following output −
(0, 5) 1 (0, 6) 1 (0, 9) 1 (0, 15) 1 (0, 25) 1 (0, 27) 1 (0, 29) 1 (0, 30) 1 (0, 31) 1