Python Pandas - Renaming Categories
Renaming categories within a categorical data type is an essential task when the labels need better clarity, and reorganization. The Pandas library provides an easy and fast method for renaming categories using the rename_categories() method.
Renames categories in a Pandas categorical object can be done with minimal effort by using this method. It supports multiple input types, such as lists, dictionaries, or callable. The method is recommended for renaming categorical data due to its speed, memory efficiency, and semantic benefits.
In this tutorial, we will focus on renaming categories of a Pandas objects using the rename_categories() method with the various examples.
The rename_categories() Method
The rename_categories() method is part of the pandas Series.cat accessor (an alias for CategoricalAccessor). It is designed to rename categories in a categorical Series or DataFrame column.
Syntax
The syntax of this method is as follows −
Series.cat.rename_categories(*args, **kwargs)
This method accepts a new_categories parameter with any of the following types such as a list, dictionary, or callable used to rename categories.
Renaming All Categories
You can rename all categories by providing a list of new names. The new list must match the number of existing categories and have unique values.
Example
Below is a basic example of renaming categories in a Pandas categorical object using a list of new category names.
import pandas as pd
import numpy as np
# Create a DataFrame
cat = pd.Categorical(["a", "c", "c", "b"], categories=["b", "a", "c"])
df = pd.DataFrame({"A_col":cat, "B_col":[1, 2, 4, 6]})
# Display the Input DataFrame
print('Input DataFrame:\n',df)
# Renaming categories using a list
new_categories = ["Group A", "Group B", "Group C"]
# Update the DataFrame after renaming the categories
df["A_col"] = df["A_col"].cat.rename_categories(new_categories)
print("\nRenamed Categories:")
print(df)
Following is the output of the above code −
Input DataFrame:
| A_col | B_col | |
|---|---|---|
| 0 | a | 1 |
| 1 | c | 2 |
| 2 | c | 4 |
| 3 | b | 6 |
| A_col | B_col | |
|---|---|---|
| 0 | Group B | 1 |
| 1 | Group C | 2 |
| 2 | Group C | 4 |
| 3 | Group A | 6 |
Renaming Specific Categories
Instead of renaming all categories like above, you can simply use a dictionary to map specific categories to the new names. Categories not included in the dictionary for mapping are remains unchanged.
Example
The following example uses a Python dictionary to rename specific categories in a Pandas categorical Series object. Here, the categories apple and banana are renamed to Fruit A and Fruit B respectively, while the category cherry remains unchanged.
import pandas as pd
import numpy as np
# Creating a categorical Series
s = pd.Series(["apple", "banana", "cherry", "apple"], dtype="category")
# Display the Input Series
print("Original Series:")
print(s)
# Renaming specific categories using a dictionary
s = s.cat.rename_categories({"apple": "Fruit A", "banana": "Fruit B"})
print("\nPartially Renamed Categories:")
print(s)
While executing the above code we get the following output −
Original Series: 0 apple 1 banana 2 cherry 3 apple dtype: category Categories (3, object): ['apple', 'banana', 'cherry'] Partially Renamed Categories: 0 Fruit A 1 Fruit B 2 cherry 3 Fruit A dtype: category Categories (3, object): ['Fruit A', 'Fruit B', 'cherry']
Dynamically Renaming Categories
o dynamically rename categories in a Pandas categorical object, you can pass a callable (a function) to the Series.cat.rename_categories() method that transforms each category name.
Example
The following example demonstrates using a callable with the Pandas Series.cat.rename_categories() method to dynamically rename categories.
import pandas as pd
import numpy as np
# Creating a categorical Series
s = pd.Series(["cat", "dog", "mouse", "cat"], dtype="category")
# Display the Input Series
print("Original Series:")
print(s)
# Dynamically renaming categories using a callable function
s = s.cat.rename_categories(lambda x: x.upper())
print("\nRenamed Categories Dynamically")
print(s)
Following is the output of the above code −
Original Series: 0 cat 1 dog 2 mouse 3 cat dtype: category Categories (3, object): ['cat', 'dog', 'mouse'] Renamed Categories Dynamically 0 CAT 1 DOG 2 MOUSE 3 CAT dtype: category Categories (3, object): ['CAT', 'DOG', 'MOUSE']
Error Occurrence While Renaming Categories
Renaming categories requires the new categories to be unique and non-null. If you does not provide the unique or non-null categories for renaming, it will raises a ValueError.
Example: Non-Unique Categories
In this example, we attempt to rename the categories of a Pandas categorical Series object using non-unique categories. This will raises a ValueError for that we will execute that particular block in the Try-Except block for error handling.
import pandas as pd
import numpy as np
# Creating a categorical Series
s = pd.Series(["apple", "banana", "cherry", "apple"], dtype="category")
# Display the Input Series
print("Original Series:")
print(s)
# Giving the Non-unique categories
try:
s.cat.rename_categories(["A", "A", "A"])
except ValueError as e:
print("\nError:", e)
Output of the above code is as follows −
Original Series: 0 apple 1 banana 2 cherry 3 apple dtype: category Categories (3, object): ['apple', 'banana', 'cherry'] Error: Categorical categories must be unique
Example: Renaming with Null categories
This example shows how to handle errors when renaming categories in a Pandas categorical object with null values.
import pandas as pd
import numpy as np
# Creating a categorical Series
s = pd.Series(["apple", "banana", "cherry", "apple"], dtype="category")
# Display the Input Series
print("Original Series:")
print(s)
# Giving the Null category
try:
s.cat.rename_categories(["A", np.nan, "C"])
except ValueError as e:
print("\nError:", e)
Following is the output of the above code −
Original Series: 0 apple 1 banana 2 cherry 3 apple dtype: category Categories (3, object): ['apple', 'banana', 'cherry'] Error: Categorical categories cannot be null