Python Pandas - Setting Categories



Setting categories in a categorical data means modifying the structure of categorical data by performing one or more of the following operations −

  • Add new categories.

  • Reorder existing categories.

  • Rename categories.

  • Remove categories.

Categorical data is common in data analysis, which represents fixed or discrete values. Managing these categories effectively is crucial for data consistency and performance. In Pandas, the set_categories() method provides an efficient way to add, remove, reorder, or rename categories in a Pandas Series, DataFrame columns, or CategoricalIndex objects.

In this tutorial, we will learn about setting categories to the Pandas categorical data using the set_categories() method with the detailed examples.

The set_categories() Method

The Pandas set_categories() method is a part of Pandas Series.cat accessor (an alias for CategoricalAccessor), designed specifically for categorical data. It allows adding, removing, renaming, or reordering categories in a categorical Series or CategoricalIndex. By combining multiple operations in a single call, it simplifies categorical data management.

Syntax

Following is the syntax of this method −

set_categories(new_categories, ordered=False, rename=False, *args, **kwargs)

The new_categories parameter accepts list-like object, specifies the new set of categories to be assigned. The rename parameter takes a boolean (by default False) determines whether categories should be renamed without altering the existing data. Additionally, the ordered parameter is used to define whether the categories should follow a specific order or not.

Setting New Categories

Setting new categories involves defining a new set of categories, which may include unused categories. If any value in the original data is not listed in the new set of categories, it is replaced with NaN. This method can simultaneously add and remove categories based on the provided list.

Example

This example demonstrates setting the new categories to the categorical Series using the Pandas set_categories() method.

import pandas as pd
import numpy as np

# Creating a categorical Series
s = pd.Series(["cat", "dog", "cat"], dtype="category")

# Display the Input Series
print("Original Series:")
print(s)

# Setting new categories
s = s.cat.set_categories(["cat", "dog", "mouse"])
print("\nSeries after setting new categories:")
print(s)

When we run above program, it produces following result −

Original Series:
0    cat
1    dog
2    cat
dtype: category
Categories (2, object): ['cat', 'dog']

Series after setting new categories:
0    cat
1    dog
2    cat
dtype: category
Categories (3, object): ['cat', 'dog', 'mouse']

Reordering Categories

Reordering categories can simplify data interpretation and sorting operations. This can be done by providing a list of categories to the new_categries parameter and using the ordered=True parameter of the set_categories() method.

Example

This example demonstrates how to reorder categories of a Pandas categorical series object using the ordered=True parameter of the set_categories() method.

import pandas as pd

# Create a categorical Series
data = pd.Categorical(['low', 'medium', 'high'], categories=['low', 'medium', 'high'], ordered=True)
s = pd.Series(data)

print("Original Series:")
print(s)

# Reorder categories
s = s.cat.set_categories(['high', 'medium', 'low'], ordered=True)
print("\nSeries after reordering categories:")

print(s)

When we run above program, it produces following result −

Original Series:
0       low
1    medium
2      high
dtype: category
Categories (3, object): ['low' < 'medium' < 'high']

Series after reordering categories:
0       low
1    medium
2      high
dtype: category
Categories (3, object): ['high' < 'medium' < 'low']

Renaming Categories

The rename=True parameter of the ser_categories() method changes the names of the existing categories without adding or removing them. This is useful for standardizing inconsistent labels without modifying the original structure.

Example

This example shows how to rename categories of an existing categorical data by providing the list with new names and the rename=True parameter.

import pandas as pd
import numpy as np

# Creating a categorical Series
s = pd.Series(["cat", "dog", "mouse", "cat"], dtype="category")

# Display the Input Series
print("Original Series:")
print(s)

# setting new categories
s = s.cat.set_categories(["Animal-1", "Animal-2", "Animal-3"], rename=True)
print("\nSeries After Renaming Categories:")
print(s)

While executing the above code we get the following output −

Original Series:
0      cat
1      dog
2    mouse
3      cat
dtype: category
Categories (3, object): ['cat', 'dog', 'mouse']

Series After Renaming Categories:
0    Animal-1
1    Animal-2
2    Animal-3
3    Animal-1
dtype: category
Categories (3, object): ['Animal-1', 'Animal-2', 'Animal-3']

Removing Categories

You can remove specific categories by defining a new set of categories that excludes them. Any removed categories will result in corresponding values being replaced by NaN.

Example

This example demonstrates removing categories using the set_categories() method.

import pandas as pd

# Creating a categorical Series
s = pd.Series(["small", "medium", "large"], dtype="category")

print("Original Series:")
print(s)

# Removing a category
s = s.cat.set_categories(["small", "large"])
print("\nSeries After Removing a Category:")

print(s)

When we run above program, it produces following result −

Original Series:
0     small
1    medium
2     large
dtype: category
Categories (3, object): ['large', 'medium', 'small']

Series After Removing a Category:
0    small
1      NaN
2    large
dtype: category
Categories (2, object): ['small', 'large']
Advertisements