Pandas set_categories() Method



The set_categories() method in Pandas is a useful tool for performing more than one operation on categorical object simultaneously. This method allows you to redefine categories by adding new ones, removing unwanted categories, renaming existing ones, or changing the order of categories all at once.

This method is a part of Pandas Series.cat accessor (an alias for CategoricalAccessor), specifically designed for categorical data. It is a straightforward and easy-to-use method for performing multiple operations in a single call on both Categorical Series or CategoricalIndex objects.

Syntax

Below you can see the syntax of the Python Pandas set_categories() method, calling this method slightly differs for both the Categorical Series or CategoricalIndex objects.

Syntax for a Pandas Categorical Series −

Series.cat.set_categories(*args, **kwargs)

Syntax for a CategoricalIndex −

CategoricalIndex.set_categories(*args, **kwargs)

Parameters

The Python Pandas set_categories() method accepts the below parameters −

  • new_categories:A list-like object representing new categories.

  • ordered: Specifying if the categories should be treated as ordered. Default it is set to False.

  • rename: It is also a boolean parameter, If set to True, treats new_categories as a renaming operation. By default it is False.

Return Value

The Pandas set_categories() method returns a Categorical Series or CategoricalIndex object with updated categories.

Example: Basic Example

This example demonstrates the basic functionality of the Series.cat.set_categories() method by renaming categories using the rename=True parameter.

import pandas as pd

# Create a categorical Series
data = pd.Categorical(['apple', 'banana', 'cherry'], categories=['apple', 'banana', 'cherry'])
s = pd.Series(data)

print("Original Series:")
print(s)

# Rename categories
s = s.cat.set_categories(['fruit_1', 'fruit_2', 'fruit_3'], rename=True)
print("\nSeries after renaming categories:")

print(s)

When we run above program, it produces following result −

Original Series:
0     apple
1    banana
2    cherry
dtype: category
Categories (3, object): ['apple', 'banana', 'cherry']

Series after renaming categories:
0    fruit_1
1    fruit_2
2    fruit_3
dtype: category
Categories (3, object): ['fruit_1', 'fruit_2', 'fruit_3']

Example: Reordering Categories in a Categorical Series

The following example demonstrates using the Pandas set_categories() method for reordering the categories of a Series object.

import pandas as pd

# Create a categorical Series
data = pd.Categorical(['low', 'medium', 'high'], categories=['low', 'medium', 'high'], ordered=True)
s = pd.Series(data)

print("Original Series:")
print(s)

# Reorder categories
s = s.cat.set_categories(['high', 'medium', 'low'], ordered=True)
print("\nSeries after reordering categories:")

print(s)

While executing the above code we get the following output −

Original Series:
0       low
1    medium
2      high
dtype: category
Categories (3, object): ['low' < 'medium' < 'high']

Series after reordering categories:
0       low
1    medium
2      high
dtype: category
Categories (3, object): ['high' < 'medium' < 'low']

Example: Adding and Removing Categories Simultaneously

The following example demonstrates how to update categories by removing "cat" and adding "fish" and "rabbit" simultaneously using the set_categories( method.

import pandas as pd

# Creating a categorical Series
s = pd.Series(pd.Categorical(['cat', 'dog', 'bird'], categories=['cat', 'dog', 'bird']))

print("Original Series:")
print(s)

# Updating categories
s = s.cat.set_categories(['fish', 'dog', 'bird', 'rabbit'])
print("\nSeries after adding and removing categories:")
print(s)

Following is an output of the above code −

Original Series:
0     cat
1     dog
2    bird
dtype: category
Categories (3, object): ['cat', 'dog', 'bird']

Series after adding and removing categories:
0     NaN
1     dog
2    bird
dtype: category
Categories (4, object): ['fish', 'dog', 'bird', 'rabbit']

Example: Using set_categories() with CategoricalIndex

This example uses the set_categories() method with a CategoricalIndex object. It works the same as the Categorical Series, but the difference lies in how it is called.

import pandas as pd

# Create a CategoricalIndex
cat_index = pd.CategoricalIndex(['A', 'B', 'C'], categories=['A', 'B', 'C'], ordered=True)

print("Original CategoricalIndex:")
print(cat_index)

# Rename categories
cat_index = cat_index.set_categories(['X', 'Y', 'Z'], rename=True)
print("\nCategoricalIndex after renaming categories:")
print(cat_index)

# Add a new category and reorder
cat_index = cat_index.set_categories(['Z', 'Y', 'X', 'W'], ordered=True)
print("\nCategoricalIndex after adding and reordering categories:")
print(cat_index)

Following is an output of the above code −

Original CategoricalIndex:
CategoricalIndex(['A', 'B', 'C'], categories=['A', 'B', 'C'], ordered=True, dtype='category')

CategoricalIndex after renaming categories:
CategoricalIndex(['X', 'Y', 'Z'], categories=['X', 'Y', 'Z'], ordered=True, dtype='category')

CategoricalIndex after adding and reordering categories:
CategoricalIndex(['X', 'Y', 'Z'], categories=['Z', 'Y', 'X', 'W'], ordered=True, dtype='category')
python_pandas_setting_categories.htm
Advertisements