Pandas add_categories() Method



The add_categories() method in pandas is a specialized function used with categorical data. It allows users to add new categories to a pandas categorical object. These new categories are appended at the last/highest position in the existing categories and are initially unused directly after this call.

This method is part of the Pandas Series.cat accessor, and is particularly useful for extending or modifying the set of categories in a categorical Series or CategoricalIndex. Which is particularly useful when expanding the valid categories for categorical data.

Syntax

Below you can see the syntax of the Python Pandas add_categories() method, it differs slightly depending on whether you're working with a Categorical Series or a CategoricalIndex object.

Syntax for a Pandas Categorical Series −

Series.cat.add_categories(new_categories, *args, **kwargs)

Syntax for a CategoricalIndex −

CategoricalIndex.add_categories(new_categories, *args, **kwargs)

When using the add_categories() method on a CategoricalIndex, there is no need to use the .cat accessor. This is because the CategoricalIndex is inherently categorical, and methods can be called directly.

Parameters

The Python Pandas add_categories() method accepts the below parameters −

  • new_categories − This parameter accepts a single category or a list-like collection of categories to add to the existing set of categories.

  • **kwargs: Additional key word arguments for more customization.

Return Value

The Pandas add_categories() method returns a Categorical object with the newly added categories appended at the end of the list.

Exception

This method will raise a ValueError if the new categories already exist in the current categories or if the input is invalid.

Example: Adding Single Category

Here is a basic example demonstrates how to add a single new category to a categorical Series using the Pandas Series.cat.add_categories() method.

import pandas as pd

# Creating a categorical Series
s = pd.Series(["apple", "banana", "cherry"], dtype="category")
print("Original Series:")
print(s)

# Adding a new category
s = s.cat.add_categories("date")
print("\nSeries after adding a new category:")
print(s)

When we run above program, it produces following result −

Original Series:
0     apple
1    banana
2    cherry
dtype: category
Categories (3, object): ['apple', 'banana', 'cherry']

Series after adding a new category:
0     apple
1    banana
2    cherry
dtype: category
Categories (4, object): ['apple', 'banana', 'cherry', 'date']

Example: Adding Multiple Categories

This example demonstrates how to add multiple new categories to the existing set of categories by providing the list-like collection of categories to the new_categories parameter.

import pandas as pd

# Creating a categorical Series
s = pd.Series(["red", "blue", "green"], dtype="category")
print("Original Series:")
print(s)

# Adding new categories
s = s.cat.add_categories(["yellow", "orange"])
print("\nSeries after adding multiple categories:")
print(s)

While executing the above code we get the following output −

Original Series:
0      red
1     blue
2    green
dtype: category
Categories (3, object): ['blue', 'green', 'red']

Series after adding multiple categories:
0      red
1     blue
2    green
dtype: category
Categories (5, object): ['blue', 'green', 'red', 'yellow', 'orange']

Example: Adding Categories to a CategoricalIndex

The following example demonstrates using the add_categories() method for adding new categories to the CategoricalIndex object.

import pandas as pd

# Creating a CategoricalIndex
catIndex = pd.CategoricalIndex(
    ["p", "q", "r", "s", "p", "q", "r", "s"], 
    ordered=True, 
    categories=["p", "q", "r", "s"]
)
print("Original CategoricalIndex:")
print(catIndex)

# Adding new categories
new_index = catIndex.add_categories(["x", "y", "z"])
print("\nCategoricalIndex after adding new categories:")
print(new_index)

Following is an output of the above code −

Original CategoricalIndex:
CategoricalIndex(['p', 'q', 'r', 's', 'p', 'q', 'r', 's'], categories=['p', 'q', 'r', 's'], ordered=True, dtype='category')

CategoricalIndex after adding new categories:
CategoricalIndex(['p', 'q', 'r', 's', 'p', 'q', 'r', 's'], categories=['p', 'q', 'r', 's', 'x', 'y', 'z'], ordered=True, dtype='category')

Example: Handling Exceptions When Adding Categories

This example demonstrates the behavior of the add_categories() method when you try to add duplicate categories.

import pandas as pd

# Creating a categorical Series
s = pd.Series(["apple", "banana", "cherry"], dtype="category")
print("Original Series:")
print(s)

try:
    # Adding an existing category
    s = s.cat.add_categories(["apple"])
except ValueError as e:
    print("\nError encountered:", e)

Following is an output of the above code −

Original Series:
0     apple
1    banana
2    cherry
dtype: category
Categories (3, object): ['apple', 'banana', 'cherry']

Error encountered: new categories must not include old categories: {'apple'}
python_pandas_Adding_categories.htm
Advertisements