Python Pandas - Removing Categories



Removing categories from categorical data is useful for removing specified categories from the existing data. The categorical data in Pandas is represented using the Categorical type. It provides specialized methods for handling categorical data through the Series.cat accessor. One such method is remove_categories(), which allows the removal of specified categories from an existing categorical object.

In this tutorial, we will learn about removing specified categories from Pandas categorical data using the remove_categories() method, along with various examples.

The remove_categories() Method

The Pandas remove_categories() method allows you to remove single or multiple categories from an existing Pandas categorical object while maintaining the original data and its order.

Syntax

Following is the syntax of this method −

Series.cat.remove_categories(removals, *args, **kwargs)

The removals is the mandatory parameter. It accepts a single category or a list of categories that should be removed from the existing categorical object. If a category is removed, the values that were part of the removed category will be set to NaN.

Removing a Single Category

You can remove a single category from an existing Pandas categorical object by providing the category to the remove_categories() method.

Example

This example demonstrates how to remove a single category from a categorical Series using the Pandas remove_categories() method. In this example, the category "dog" will be removed from the series. The values at "dog" will be replaced with NaN, and the categories list will also be updated.

import pandas as pd
import numpy as np

# Creating a categorical Series
s = pd.Series(["cat", "dog", "mouse", "cat"], dtype="category")

# Display the Input Series
print("Original Series:")
print(s)

# Removing a category
s = s.cat.remove_categories("dog")
print("\nSeries after removing a category:")
print(s)

When we run above program, it produces following result −

Original Series:
0      cat
1      dog
2    mouse
3      cat
dtype: category
Categories (3, object): ['cat', 'dog', 'mouse']

Series after removing a category:
0      cat
1      NaN
2    mouse
3      cat
dtype: category
Categories (2, object): ['cat', 'mouse']

Removing Multiple Categories

You can also remove multiple categories by passing a list of categories to the remove_categories() method.

Example

This example shows how to remove multiple categories from the existing categorical data by providing the list with collection of categories to the removals parameter. Here both the categories "dog" and "mouse" will be removed from the series, and the corresponding values will be replaced with NaN.

import pandas as pd
import numpy as np

# Creating a categorical Series
s = pd.Series(["cat", "dog", "mouse", "cat"], dtype="category")

# Display the Input Series
print("Original Series:")
print(s)

# Removing multiple categories
s = s.cat.remove_categories(["dog", "mouse"])
print("\nSeries after removing multiple categories:")
print(s)

While executing the above code, we get the following output −

Original Series:
0      cat
1      dog
2    mouse
3      cat
dtype: category
Categories (3, object): ['cat', 'dog', 'mouse']

Series after removing multiple categories:
0    cat
1    NaN
2    NaN
3    cat
dtype: category
Categories (1, object): ['cat']

Removing Categories from a DataFrame Column

The Pandas cat.remove_categories() method can also be applied to specific columns in a DataFrame. This method works on columns of the category dtype, and it expands the set of categories for that column without modifying the existing data.

Example

This example demonstrates how to remove categories from a specific column of a Pandas DataFrame.

import pandas as pd

# Creating a DataFrame with a categorical column
df = pd.DataFrame({
    "Animal": ["Cat", "Dog", "Mouse"],
    "Category": pd.Series(["A", "B", "A"], dtype="category")
})

# Display the Input DataFrame
print("Original DataFrame:")
print(df)

# Removing categories from the 'Category' column
df["Category"] = df["Category"].cat.remove_categories(["A"])

# Display the updated DataFrame
print("\nDataFrame after removing categories:")
print(df)

# Checking the updated categories
print("\nUpdated categories in 'Category' column:")
print(df["Category"].cat.categories)

When we run above program, it produces following result −

Original DataFrame:
  Animal Category
0    Cat        A
1    Dog        B
2  Mouse        A

DataFrame after removing categories:
  Animal Category
0    Cat      NaN
1    Dog        B
2  Mouse      NaN

Updated categories in 'Category' column:
Index(['B'], dtype='object')

Handling Errors While Removing Categories

If you attempt to remove a category that does not exist in the original categorical object, the method raises a ValueError. This ensures data integrity by preventing the removal of invalid categories.

Example

The following example demonstrates handling exceptions when you try to remove a non-existent category using the remove_categories() method.

import pandas as pd
import numpy as np

# Creating a categorical Series
s = pd.Series(["cat", "dog", "mouse", "cat"], dtype="category")

# Display the Input Series
print("Original Series:")
print(s)

try:
    # Attempting to remove a non-existent category
    s = s.cat.remove_categories(["elephant"])
except ValueError as e:
    print("\nError:", e)

Following is an output of the above code −

Original Series:
0      cat
1      dog
2    mouse
3      cat
dtype: category
Categories (3, object): ['cat', 'dog', 'mouse']

Error: removals must all be in old categories: {'elephant'}
Advertisements