Python Pandas - Reordering Categories



Reordering categories in a Pandas Categorical Series allows you to define a specific order for categories, which can be useful for sorting and comparison operations.

Generally, Categorical data types are used to represent a fixed number of possible values, such as states, colors, or categories of items. These categories can be ordered or unordered. If the data is ordered, you can perform operations like sorting, min(), max(), and comparisons. If the data is unordered, Pandas raises a TypeError when performing such operations.

Pandas provides two primary methods for reordering categorical data −

In this tutorial, we will learn about the process of reordering categories using these methods, explaining how they work and their limitations.

Reordering Categories with reorder_categories()

The Categorical.reorder_categories() method in Pandas allows you to change the order of the existing categories in a categorical object. This method is used when you want to reorder the categories without adding or removing categories. It only works when all old categories are included in the new categories list, and no new categories can be added.

Example

Here is a simple example to demonstrate how to reorder categories using the Categorical.reorder_categories() method.

import pandas as pd

# Create a categorical Series
s = pd.Series(['low', 'medium', 'high', 'low'], dtype="category")

print("Original Series:")
print(s)

# Reordering the categories
s = s.cat.reorder_categories(['high', 'medium', 'low'], ordered=True)

print("\nSeries after reordering categories:")
print(s)

Following is the output of the above code −

Original Series:
0       low
1    medium
2      high
3       low
dtype: category
Categories (3, object): ['high', 'low', 'medium']

Series after reordering categories:
0       low
1    medium
2      high
3       low
dtype: category
Categories (3, object): ['high' < 'medium' < 'low']

Reordering Categories with set_categories()

The Pandas set_categories() method allows you to redefine the categories of a Categorical object, which can be used to reorder the categories, but it also allows you to add or remove categories. Unlike reorder_categories() method, set_categories() method is more flexible, as it does not require the new category list to include all old categories.

Example

This example demonstrates using the set_categories() method for reordering categories of a categorical object.

import pandas as pd

# Create a categorical Series
s = pd.Series(['low', 'medium', 'high', 'low'], dtype="category")

print("Original Series:")
print(s)

# Reorder categories
s = s.cat.set_categories(['high', 'medium', 'low'], ordered=True)
print("\nSeries after reordering categories:")

print(s)

Following is the output of the above code −

Original Series:
0       low
1    medium
2      high
3       low
dtype: category
Categories (3, object): ['high', 'low', 'medium']

Series after reordering categories:
0       low
1    medium
2      high
3       low
dtype: category
Categories (3, object): ['high' < 'medium' < 'low']

Handling NaN Values During Reordering

NaN values are treated differently during category reordering. They are not part of the category order and are typically placed at the end when sorted.

Example

This example demonstrates the basic behavior of NaN values during the reordering categorical data.

import pandas as pd
import numpy as np

# Create a Categorical Series with NaN values
data = pd.Categorical([1, 2, 3, np.nan, 1], ordered=True, dtype="category")
s = pd.Series(data)

# Display the Original Series with sorted values
print("Original Series with sorted values")
print(s.sort_values())

# Reorder categories
s = s.cat.reorder_categories([2, 3, 1], ordered=True)
# Sort the values with NaN at the end
s = s.sort_values()

# Display the reordered and Sorted the values
print("\nReordered and Sorted the values:")
print(s.sort_values())

Following is the output of the above code −

Original Series with sorted values
0      1
4      1
1      2
2      3
3    NaN
dtype: category
Categories (3, int64): [1 < 2 < 3]

Reordered and Sorted the values:
1      2
2      3
0      1
4      1
3    NaN
dtype: category
Categories (3, int64): [2 < 3 < 1]

Common Errors in Reordering Categories

When working with Categorical data, you may encounter common errors related to reordering categories. Which are −

  • ValueError: This error occurs when you try to reorder categories, but the new categories list does not match all the old categories or includes new categories that were not present before.

  • TypeError: Operations like min() and max() will not work unless the categorical data is ordered. To use these operations, the categorical data must be ordered.

Example: ValueError

The following example demonstrates occurrence of ValueError when you try to reorder categorical data with a missing category.

import pandas as pd

# Create a categorical Series
s = pd.Series(['low', 'medium', 'high', 'low'], dtype="category")

print("Original Series:")
print(s)

# Try to reorder categories new list
try:
    s = s.cat.reorder_categories(['low', 'medium'])
except ValueError as e:
    print("\nValueError:", str(e))

When we run above program, it produces following result −

Original Series:
0       low
1    medium
2      high
3       low
dtype: category
Categories (3, object): ['high', 'low', 'medium']

ValueError: items in new_categories are not the same as in old categories

Example: TypeError with Unordered Categories

The following example demonstrates scenario where the TypeError will occurs, here we are tying to perform the min/max operations on the unordered categorical data.

import pandas as pd

# Create an unordered Categorical series
s = pd.Series([1, 2, 3, 1], dtype="category")

print("Original Series:")
print(s)

# Attempt to use min/max on an unordered series
try:   
    print(s.min()) 
except TypeError as e:
    print("\nTypeError:", str(e))

When we run above program, it produces following result −

Original Series:
0    1
1    2
2    3
3    1
dtype: category
Categories (3, int64): [1, 2, 3]

TypeError: Categorical is not ordered for operation min
you can use .as_ordered() to change the Categorical to an ordered one
Advertisements