Python Pandas - Create an Index based on an underlying Categorical

To create an Index based on an underlying Categorical, use the pandas.CategoricalIndex() method. A CategoricalIndex can only take on a limited and usually fixed number of possible values, making it memory-efficient for repetitive data.

Syntax

pandas.CategoricalIndex(data=None, categories=None, ordered=None, dtype=None)

Parameters

The key parameters for creating a CategoricalIndex are ?

  • data ? Array-like values for the categorical index
  • categories ? List of possible values (categories)
  • ordered ? Boolean indicating if categories have a meaningful order
  • dtype ? Data type, typically 'category'

Creating a Basic CategoricalIndex

First, import pandas and create a simple CategoricalIndex ?

import pandas as pd

# Create a CategoricalIndex with ordered categories
cat_index = pd.CategoricalIndex(
    ["p", "q", "r", "s", "p", "q", "r", "s"], 
    ordered=True, 
    categories=["p", "q", "r", "s"]
)

print("Categorical Index:")
print(cat_index)
Categorical Index:
CategoricalIndex(['p', 'q', 'r', 's', 'p', 'q', 'r', 's'], categories=['p', 'q', 'r', 's'], ordered=True, dtype='category')

Accessing Properties

You can access various properties of the CategoricalIndex ?

import pandas as pd

cat_index = pd.CategoricalIndex(
    ["p", "q", "r", "s", "p", "q", "r", "s"], 
    ordered=True, 
    categories=["p", "q", "r", "s"]
)

# Get the categories
print("Categories:")
print(cat_index.categories)

# Check if ordered
print("\nIs Ordered:", cat_index.ordered)

# Get min and max values (only works with ordered categories)
print("\nMinimum value:", cat_index.min())
print("Maximum value:", cat_index.max())
Categories:
Index(['p', 'q', 'r', 's'], dtype='object')

Is Ordered: True

Minimum value: p
Maximum value: s

Using CategoricalIndex with DataFrame

CategoricalIndex is commonly used as DataFrame index for memory efficiency ?

import pandas as pd

# Create a DataFrame with CategoricalIndex
cat_index = pd.CategoricalIndex(
    ["Low", "Medium", "High", "Low", "Medium"], 
    categories=["Low", "Medium", "High"],
    ordered=True
)

df = pd.DataFrame(
    {"Values": [10, 20, 30, 15, 25]}, 
    index=cat_index
)

print("DataFrame with CategoricalIndex:")
print(df)
print("\nIndex type:", type(df.index))
DataFrame with CategoricalIndex:
        Values
Low         10
Medium      20
High        30
Low         15
Medium      25

Index type: <class 'pandas.CategoricalIndex'>

Key Points

  • Memory Efficient ? Uses less memory for repetitive categorical data
  • Ordered Categories ? Can represent ordered relationships between categories
  • Fixed Categories ? Only accepts values from predefined categories
  • DataFrame Integration ? Can be used as DataFrame index for better performance

Conclusion

CategoricalIndex is ideal for indexing data with a limited set of repeating values. Use ordered=True when categories have meaningful order, and leverage it for memory-efficient DataFrame operations.

Updated on: 2026-03-26T16:42:23+05:30

578 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements