Python - Typecasting Pandas into set

To typecast Pandas DataFrame columns into a set, use the set() function. This is useful for removing duplicates and performing set operations like union, intersection, and difference.

Creating a DataFrame

Let us first create a DataFrame with employee data ?

import pandas as pd

# Create DataFrame
dataFrame = pd.DataFrame(
   {
      "EmpName": ['John', 'Ted', 'Jacob', 'Scarlett', 'Ami', 'Ted', 'Scarlett'],
      "Zone": ['North', 'South', 'South', 'East', 'West', 'East', 'North']
   }
)

print("DataFrame:")
print(dataFrame)
DataFrame:
   EmpName    Zone
0     John   North
1      Ted   South
2    Jacob   South
3 Scarlett    East
4      Ami    West
5      Ted    East
6 Scarlett   North

Converting Pandas Series to Set

Convert individual columns to sets to remove duplicates ?

import pandas as pd

dataFrame = pd.DataFrame(
   {
      "EmpName": ['John', 'Ted', 'Jacob', 'Scarlett', 'Ami', 'Ted', 'Scarlett'],
      "Zone": ['North', 'South', 'South', 'East', 'West', 'East', 'North']
   }
)

# Convert columns to sets
emp_set = set(dataFrame.EmpName)
zone_set = set(dataFrame.Zone)

print("Employee names as set:", emp_set)
print("Zones as set:", zone_set)
Employee names as set: {'John', 'Ted', 'Jacob', 'Scarlett', 'Ami'}
Zones as set: {'North', 'South', 'East', 'West'}

Set Operations

Perform set union to combine unique values from both columns ?

import pandas as pd

dataFrame = pd.DataFrame(
   {
      "EmpName": ['John', 'Ted', 'Jacob', 'Scarlett', 'Ami', 'Ted', 'Scarlett'],
      "Zone": ['North', 'South', 'South', 'East', 'West', 'East', 'North']
   }
)

# Set union - combine all unique values
combined_set = set(dataFrame.EmpName) | set(dataFrame.Zone)
print("Union of both columns:", combined_set)

# Set intersection - common values (if any)
common_values = set(dataFrame.EmpName) & set(dataFrame.Zone)
print("Common values:", common_values)
Union of both columns: {'John', 'Ted', 'Jacob', 'Scarlett', 'Ami', 'North', 'South', 'East', 'West'}
Common values: set()

Use Cases

Converting Pandas columns to sets is useful for:

  • Removing duplicates from column values
  • Finding unique values across multiple columns
  • Set operations like union, intersection, and difference
  • Data validation and comparison tasks

Conclusion

Use set(dataframe.column) to convert Pandas columns into sets for duplicate removal and set operations. The union operator | combines unique values from multiple columns efficiently.

Updated on: 2026-03-26T02:00:48+05:30

228 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements