Pandas remove_categories() Method



The remove_categories() method in Pandas is part of the Series.cat accessor, specifically designed for categorical data. This method is used to remove one or more specified categories from a Pandas Categorical Series or CategoricalIndex objects.

When categories are removed then the corresponding values in the data are replaced with NaN. This method is particularly useful you when to clean or transform the categorical data.

Syntax

Below you can see the syntax of the Python Pandas remove_categories() method, it is slightly differs for both the Categorical Series or a CategoricalIndex objects.

Syntax for a Pandas Categorical Series −

Series.cat.remove_categories(removals, *args, **kwargs)

Syntax for a CategoricalIndex −

CategoricalIndex.remove_categories(removals, *args, **kwargs)

While calling the remove_categories() method on a CategoricalIndex, you need not to use the .cat accessor. This is because the CategoricalIndex is inherently categorical, and methods can be called directly.

Parameters

The Python Pandas remove_categories() method accepts the below parameters −

  • removals − Specifies the categories to be removed. This parameter accepts a single category or a list-like collection of categories to remove. These categories must exist in the current categories; otherwise, a ValueError is raised.

  • **kwargs: Additional key word arguments for more customization.

Return Value

The Pandas remove_categories() method returns a Categorical object with the specified categories removed. Values associated with the removed categories are replaced with NaN.

Exception

This method will raise a ValueError if the specified categories are not found in the existing categories.

Example: Removing Single Category

Here is a basic example demonstrates how to remove a single category to a categorical Series using the Pandas Series.cat.remove_categories() method.

import pandas as pd

# Creating a categorical Series
s = pd.Series(["apple", "banana", "cherry"], dtype="category")
print("Original Series:")
print(s)

# Removing a new category
s = s.cat.remove_categories("apple")
print("\nSeries after removing a category:")
print(s)

When we run above program, it produces following result −

Original Series:
0     apple
1    banana
2    cherry
dtype: category
Categories (3, object): ['apple', 'banana', 'cherry']

Series after removing a category:
0       NaN
1    banana
2    cherry
dtype: category
Categories (2, object): ['banana', 'cherry']

Example: Removing Multiple Categories

This example demonstrates removing multiple categories from a categorical Series using the Pandas Series.cat.remove_categories() method.

import pandas as pd

# Creating a categorical Series
s = pd.Series(["red", "blue", "yellow", "orange", "green"], dtype="category")
print("Original Series:")
print(s)

# Removing multiple categories
s = s.cat.remove_categories(["yellow", "red"])
print("\nSeries after removing multiple categories:")
print(s)

While executing the above code we get the following output −

Original Series:
0       red
1      blue
2    yellow
3    orange
4     green
dtype: category
Categories (5, object): ['blue', 'green', 'orange', 'red', 'yellow']

Series after removing multiple categories:
0       NaN
1      blue
2       NaN
3    orange
4     green
dtype: category
Categories (3, object): ['blue', 'green', 'orange']

Example: Removing Specified Categories from a CategoricalIndex

The following example demonstrates using the CategoricalIndex.remove_categories() method for removing specified categories from the CategoricalIndex object.

import pandas as pd

# Creating a CategoricalIndex
catIndex = pd.CategoricalIndex(
["p", "q", "r", "s", "p", "q", "r", "s"], 
ordered=True, 
categories=["p", "q", "r", "s"]
)
print("Original CategoricalIndex:")
print(catIndex)

# Removing new categories
new_index = catIndex.remove_categories(["r", "p"])
print("\nCategoricalIndex after Removing specified categories:")
print(new_index)

Following is an output of the above code −

Original CategoricalIndex:
CategoricalIndex(['p', 'q', 'r', 's', 'p', 'q', 'r', 's'], categories=['p', 'q', 'r', 's'], ordered=True, dtype='category')

CategoricalIndex after Removing specified categories:
CategoricalIndex([nan, 'q', nan, 's', nan, 'q', nan, 's'], categories=['q', 's'], ordered=True, dtype='category')

Example: Error Occurrence while Removing Categories

This example demonstrates the behavior of the remove_categories() method when you try to remove categories that are not present in the original categorical object.

import pandas as pd

# Creating a categorical Series
s = pd.Series(["apple", "banana", "cherry"], dtype="category")
print("Original Series:")
print(s)

try:
    # Try remove a non-existent category
    s = s.cat.remove_categories(['a'])
except ValueError as e:
    print("\nError:", e)

Following is an output of the above code −

Original Series:
0     apple
1    banana
2    cherry
dtype: category
Categories (3, object): ['apple', 'banana', 'cherry']

Error: removals must all be in old categories: {'a'}
python_pandas_removing_categories.htm
Advertisements