Python Pandas - Appending Categories



Appending categories to the categorical data is useful for appending new valid categories without modifying existing data. In pandas, categorical data is a powerful tool for managing data with fixed, limited values and represented using the Categorical type.

It provides specialized methods for handling categorical data through the Series.cat accessor. One such method is add_categories(), which allows appending new categories to an existing categorical object.

In this tutorial, we will learn about appending categories to the Pandas categorical data using its related functionalities with the various examples.

The add_categories() Method

The Pandas Series.cat.add_categories() method allows you to add single or multiple categories at once to the existing Pandas categorical object by maintaining its original data and its order.

Syntax

Following is the syntax of this method −

Series.cat.add_categories(new_categories, *args, **kwargs)

This method accepts a single mandatory new_categories parameter for appending new categories to the existing categorical object, which accepts a single value or list-like structure representing the new categories to append.

Appending a Single Category

You can append a single category to an existing Pandas categorical object by providing the single category to the add_categories() method.

Example

This example demonstrates how to add a single new category to a categorical Series using the Pandas Series.cat.add_categories() method.

import pandas as pd
import numpy as np

# Creating a categorical Series
s = pd.Series(["cat", "dog", "mouse", "cat"], dtype="category")

# Display the Input Series
print("Original Series:")
print(s)

# Appending a new category
s = s.cat.add_categories("AA")
print("\nSeries after appending a new category:")
print(s)

When we run above program, it produces following result −

Original Series:
0      cat
1      dog
2    mouse
3      cat
dtype: category
Categories (3, object): ['cat', 'dog', 'mouse']

Series after appending a new category:
0      cat
1      dog
2    mouse
3      cat
dtype: category
Categories (4, object): ['cat', 'dog', 'mouse', 'AA']

Appending Multiple Categories

You can append multiple categories simultaneously by passing a list of new categories to the Series.cat.add_categories() method.

Example

This example shows how to add multiple new categories to the existing categorical data by providing the list with collection of categories to the new_categories parameter.

import pandas as pd
import numpy as np

# Creating a categorical Series
s = pd.Series(["cat", "dog", "mouse", "cat"], dtype="category")

# Display the Input Series
print("Original Series:")
print(s)

# Appending new categories
s = s.cat.add_categories(["Duck", "Wolf"])
print("\nSeries after appending multiple categories:")
print(s)

While executing the above code we get the following output −

Original Series:
0      cat
1      dog
2    mouse
3      cat
dtype: category
Categories (3, object): ['cat', 'dog', 'mouse']

Series after appending multiple categories:
0      cat
1      dog
2    mouse
3      cat
dtype: category
Categories (5, object): ['cat', 'dog', 'mouse', 'Duck', 'Wolf']

Appending Categories to a DataFrame Column

The Pandas cat.add_categories() method can be used to append new categories to a specific column of a DataFrame. This method works on columns that are of the category dtype, and it expands the set of categories for that column without modifying the existing data.

Example

This example demonstrates how to append categories to a specific column in a DataFrame, expanding its categories while maintaining existing data.

import pandas as pd

# Creating a DataFrame with a categorical column
df = pd.DataFrame({
    "Animal": ["Cat", "Dog", "Mouse"],
    "Category": pd.Series(["A", "B", "A"], dtype="category")
})

# Display the Input DataFrame
print("Original DataFrame:")
print(df)

# Appending new categories to the 'Category' column
df["Category"] = df["Category"].cat.add_categories(["C", "D"])

# Display the updated DataFrame
print("\nDataFrame after appending new categories:")
print(df)

# Checking the updated categories
print("\nUpdated categories in 'Category' column:")
print(df["Category"].cat.categories)

When we run above program, it produces following result −

Original DataFrame:
  Animal Category
0    Cat        A
1    Dog        B
2  Mouse        A

DataFrame after appending new categories:
  Animal Category
0    Cat        A
1    Dog        B
2  Mouse        A

Updated categories in 'Category' column:
Index(['A', 'B', 'C', 'D'], dtype='object')

Handling Duplicate or Invalid Categories

If you are attempting to append a category that already exists in the original categorical object this method will raise a ValueError. This ensures that data integrity and prevents from unnecessary categories, meaning that appending categories does not modify existing data and focuses on expanding the list of valid categories.

Example

The following example demonstrates handling exceptions when you try to appending duplicate or invalid categories using the Series.cat.add_categories() method.

import pandas as pd
import numpy as np

# Creating a categorical Series
s = pd.Series(["cat", "dog", "mouse", "cat"], dtype="category")

# Display the Input Series
print("Original Series:")
print(s)

try:
    # Appending an existing category
    s = s.cat.add_categories(["cat"])
except ValueError as e:
    print("\nError encountered:", e)

Following is an output of the above code −

Original Series:
0      cat
1      dog
2    mouse
3      cat
dtype: category
Categories (3, object): ['cat', 'dog', 'mouse']

Error encountered: new categories must not include old categories: {'cat'}
Advertisements