Pandas remove_unused_categories() Method



The remove_unused_categories() method in Pandas is a useful tool for cleaning up categorical data. This method removes categories from a Categorical Series or CategoricalIndex that are not currently being used, resulting a clean data.

This method is a part of Pandas Series.cat accessor (an alias for CategoricalAccessor), specifically designed for categorical data. It is a straightforward and easy-to-use method for cleaning up or transforming the categorical data.

Syntax

Below you can see the syntax of the Python Pandas remove_unused_categories() method, using this method is slightly differs for both the Categorical Series or CategoricalIndex objects.

Syntax for a Pandas Categorical Series −

Series.cat.remove_unused_categories(*args, **kwargs)

Syntax for a CategoricalIndex −

CategoricalIndex.remove_unused_categories(*args, **kwargs)

While calling the remove_unused_categories() method on a Categorical Series, you need to use the .cat accessor. For the CategoricalIndex object you can directly call the method, because the CategoricalIndex is inherently categorical, and methods can be called directly.

Parameters

The Python Pandas remove_unused_categories() method does not take any mandatory parameters. It simply operates on the object it is called on for removing categories that are not used.

Return Value

The Pandas remove_unused_categories() method returns a Categorical object with the unused categories dropped. It will return the same Categorical object if there is no unused category available.

Example: Basic Example

This example demonstrates the basic functionality of removing unused categories from a categorical Series using the Series.cat.remove_unused_categories() method.

import pandas as pd

# Creating a categorical Series with an unused category
s = pd.Series(pd.Categorical(["apple", "banana", "cherry"], 
categories=["apple", "banana", "cherry", "grape"]))

print("Original Series:")
print(s)

# Removing unused category
s = s.cat.remove_unused_categories()
print("\nSeries after removing a category:")
print(s)

When we run above program, it produces following result −

Original Series:
0     apple
1    banana
2    cherry
dtype: category
Categories (4, object): ['apple', 'banana', 'cherry', 'grape']

Series after removing a category:
0     apple
1    banana
2    cherry
dtype: category
Categories (3, object): ['apple', 'banana', 'cherry']

Example: Grouping and Dropping Unused Categories

When working with grouped data, Series.cat.remove_unused_categories() method helps you to efficiently cleans up empty categories. The following example demonstrates using the Pandas remove_unused_categories() method on a grouped data.

import pandas as pd

# Create a DataFrame with categorical data
cats = pd.Categorical(["a", "b", "b", "b", "c", "c", "c"], categories=["a", "b", "c", "d"])
df = pd.DataFrame({"cats": cats, "values": [1, 2, 2, 2, 3, 4, 5]})
print("Original DataFrame:\n", df)

# Grouping and Dropping Unused Categories
df['cats'] = df['cats'].cat.remove_unused_categories()

# Group by categories and calculate the mean
print("\nGrouped DataFrame with removed unused categories:\n", df.groupby('cats').mean())

While executing the above code we get the following output −

Original DataFrame:
   cats  values
0    a       1
1    b       2
2    b       2
3    b       2
4    c       3
5    c       4
6    c       5

Grouped DataFrame with removed unused categories:
       values
cats        
a        1.0
b        2.0
c        4.0

Example: Removing Unused Categories from a CategoricalIndex

The following example demonstrates using the CategoricalIndex.remove_unused_categories() method for removing unused categories from the CategoricalIndex object.

import pandas as pd

# Creating a CategoricalIndex
catIndex = pd.CategoricalIndex(
["p", "q", "r", "p", "q", "r"], 
ordered=True, 
categories=["p", "q", "r", "s"]
)
print("Original CategoricalIndex:")
print(catIndex)

# Removing unused categories
new_index = catIndex.remove_unused_categories()
print("\nCategoricalIndex after Removing unused categories:")
print(new_index)

Following is an output of the above code −

Original CategoricalIndex:
CategoricalIndex(['p', 'q', 'r', 'p', 'q', 'r'], categories=['p', 'q', 'r', 's'], ordered=True, dtype='category')

CategoricalIndex after Removing unused categories:
CategoricalIndex(['p', 'q', 'r', 'p', 'q', 'r'], categories=['p', 'q', 'r'], ordered=True, dtype='category')
Advertisements