Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python - Convert Pandas DataFrame to binary data
Use the get_dummies() method to convert categorical DataFrame to binary data. This process creates binary columns for each unique category, where 1 indicates the presence of that category and 0 indicates absence.
Creating a Sample DataFrame
Let's start by creating a DataFrame with categorical data ?
import pandas as pd
# Create DataFrame with categorical data
dataFrame = pd.DataFrame(
{
"Student": ['Jack', 'Robin', 'Ted', 'Scarlett', 'Kat'],
"Result": ['Pass', 'Fail', 'Fail', 'Pass', 'Pass']
}
)
print("Original DataFrame:")
print(dataFrame)
Original DataFrame:
Student Result
0 Jack Pass
1 Robin Fail
2 Ted Fail
3 Scarlett Pass
4 Kat Pass
Converting Single Column to Binary
Use get_dummies() to convert the "Result" column to binary form ?
import pandas as pd
dataFrame = pd.DataFrame(
{
"Student": ['Jack', 'Robin', 'Ted', 'Scarlett', 'Kat'],
"Result": ['Pass', 'Fail', 'Fail', 'Pass', 'Pass']
}
)
# Convert single column to binary
dfBinary = pd.get_dummies(dataFrame["Result"])
print("Binary representation of Result column:")
print(dfBinary)
Binary representation of Result column: Fail Pass 0 0 1 1 1 0 2 1 0 3 0 1 4 0 1
Converting Entire DataFrame to Binary
You can also convert the entire DataFrame, which will create binary columns for all categorical data ?
import pandas as pd
dataFrame = pd.DataFrame(
{
"Student": ['Jack', 'Robin', 'Ted', 'Scarlett', 'Kat'],
"Result": ['Pass', 'Fail', 'Fail', 'Pass', 'Pass']
}
)
# Convert entire DataFrame to binary
dfFullBinary = pd.get_dummies(dataFrame)
print("Full DataFrame in binary form:")
print(dfFullBinary)
Full DataFrame in binary form: Student_Jack Student_Kat Student_Robin Student_Scarlett Student_Ted Result_Fail Result_Pass 0 1 0 0 0 0 0 1 1 0 0 1 0 0 1 0 2 0 0 0 0 1 1 0 3 0 0 0 1 0 0 1 4 0 1 0 0 0 0 1
Using Prefix for Column Names
Add a prefix to the generated column names for better organization ?
import pandas as pd
dataFrame = pd.DataFrame(
{
"Student": ['Jack', 'Robin', 'Ted', 'Scarlett', 'Kat'],
"Result": ['Pass', 'Fail', 'Fail', 'Pass', 'Pass']
}
)
# Convert with custom prefix
dfWithPrefix = pd.get_dummies(dataFrame["Result"], prefix="Grade")
print("Binary data with custom prefix:")
print(dfWithPrefix)
Binary data with custom prefix: Grade_Fail Grade_Pass 0 0 1 1 1 0 2 1 0 3 0 1 4 0 1
Conclusion
The get_dummies() method efficiently converts categorical data to binary form by creating separate columns for each category. This is essential for machine learning algorithms that require numerical input data.
