Python - Convert Pandas DataFrame to binary data

Use the get_dummies() method to convert categorical DataFrame to binary data. This process creates binary columns for each unique category, where 1 indicates the presence of that category and 0 indicates absence.

Creating a Sample DataFrame

Let's start by creating a DataFrame with categorical data ?

import pandas as pd

# Create DataFrame with categorical data
dataFrame = pd.DataFrame(
    {
        "Student": ['Jack', 'Robin', 'Ted', 'Scarlett', 'Kat'],
        "Result": ['Pass', 'Fail', 'Fail', 'Pass', 'Pass']
    }
)

print("Original DataFrame:")
print(dataFrame)
Original DataFrame:
    Student Result
0      Jack   Pass
1     Robin   Fail
2       Ted   Fail
3  Scarlett   Pass
4       Kat   Pass

Converting Single Column to Binary

Use get_dummies() to convert the "Result" column to binary form ?

import pandas as pd

dataFrame = pd.DataFrame(
    {
        "Student": ['Jack', 'Robin', 'Ted', 'Scarlett', 'Kat'],
        "Result": ['Pass', 'Fail', 'Fail', 'Pass', 'Pass']
    }
)

# Convert single column to binary
dfBinary = pd.get_dummies(dataFrame["Result"])
print("Binary representation of Result column:")
print(dfBinary)
Binary representation of Result column:
   Fail  Pass
0     0     1
1     1     0
2     1     0
3     0     1
4     0     1

Converting Entire DataFrame to Binary

You can also convert the entire DataFrame, which will create binary columns for all categorical data ?

import pandas as pd

dataFrame = pd.DataFrame(
    {
        "Student": ['Jack', 'Robin', 'Ted', 'Scarlett', 'Kat'],
        "Result": ['Pass', 'Fail', 'Fail', 'Pass', 'Pass']
    }
)

# Convert entire DataFrame to binary
dfFullBinary = pd.get_dummies(dataFrame)
print("Full DataFrame in binary form:")
print(dfFullBinary)
Full DataFrame in binary form:
   Student_Jack  Student_Kat  Student_Robin  Student_Scarlett  Student_Ted  Result_Fail  Result_Pass
0             1            0              0                 0            0            0            1
1             0            0              1                 0            0            1            0
2             0            0              0                 0            1            1            0
3             0            0              0                 1            0            0            1
4             0            1              0                 0            0            0            1

Using Prefix for Column Names

Add a prefix to the generated column names for better organization ?

import pandas as pd

dataFrame = pd.DataFrame(
    {
        "Student": ['Jack', 'Robin', 'Ted', 'Scarlett', 'Kat'],
        "Result": ['Pass', 'Fail', 'Fail', 'Pass', 'Pass']
    }
)

# Convert with custom prefix
dfWithPrefix = pd.get_dummies(dataFrame["Result"], prefix="Grade")
print("Binary data with custom prefix:")
print(dfWithPrefix)
Binary data with custom prefix:
   Grade_Fail  Grade_Pass
0           0           1
1           1           0
2           1           0
3           0           1
4           0           1

Conclusion

The get_dummies() method efficiently converts categorical data to binary form by creating separate columns for each category. This is essential for machine learning algorithms that require numerical input data.

Updated on: 2026-03-26T02:41:28+05:30

4K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements