Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Python Pandas - Create an Index based on an underlying Categorical
To create an Index based on an underlying Categorical, use the pandas.CategoricalIndex() method. A CategoricalIndex can only take on a limited and usually fixed number of possible values, making it memory-efficient for repetitive data.
Syntax
pandas.CategoricalIndex(data=None, categories=None, ordered=None, dtype=None)
Parameters
The key parameters for creating a CategoricalIndex are ?
- data ? Array-like values for the categorical index
- categories ? List of possible values (categories)
- ordered ? Boolean indicating if categories have a meaningful order
- dtype ? Data type, typically 'category'
Creating a Basic CategoricalIndex
First, import pandas and create a simple CategoricalIndex ?
import pandas as pd
# Create a CategoricalIndex with ordered categories
cat_index = pd.CategoricalIndex(
["p", "q", "r", "s", "p", "q", "r", "s"],
ordered=True,
categories=["p", "q", "r", "s"]
)
print("Categorical Index:")
print(cat_index)
Categorical Index: CategoricalIndex(['p', 'q', 'r', 's', 'p', 'q', 'r', 's'], categories=['p', 'q', 'r', 's'], ordered=True, dtype='category')
Accessing Properties
You can access various properties of the CategoricalIndex ?
import pandas as pd
cat_index = pd.CategoricalIndex(
["p", "q", "r", "s", "p", "q", "r", "s"],
ordered=True,
categories=["p", "q", "r", "s"]
)
# Get the categories
print("Categories:")
print(cat_index.categories)
# Check if ordered
print("\nIs Ordered:", cat_index.ordered)
# Get min and max values (only works with ordered categories)
print("\nMinimum value:", cat_index.min())
print("Maximum value:", cat_index.max())
Categories: Index(['p', 'q', 'r', 's'], dtype='object') Is Ordered: True Minimum value: p Maximum value: s
Using CategoricalIndex with DataFrame
CategoricalIndex is commonly used as DataFrame index for memory efficiency ?
import pandas as pd
# Create a DataFrame with CategoricalIndex
cat_index = pd.CategoricalIndex(
["Low", "Medium", "High", "Low", "Medium"],
categories=["Low", "Medium", "High"],
ordered=True
)
df = pd.DataFrame(
{"Values": [10, 20, 30, 15, 25]},
index=cat_index
)
print("DataFrame with CategoricalIndex:")
print(df)
print("\nIndex type:", type(df.index))
DataFrame with CategoricalIndex:
Values
Low 10
Medium 20
High 30
Low 15
Medium 25
Index type: <class 'pandas.CategoricalIndex'>
Key Points
- Memory Efficient ? Uses less memory for repetitive categorical data
- Ordered Categories ? Can represent ordered relationships between categories
- Fixed Categories ? Only accepts values from predefined categories
- DataFrame Integration ? Can be used as DataFrame index for better performance
Conclusion
CategoricalIndex is ideal for indexing data with a limited set of repeating values. Use ordered=True when categories have meaningful order, and leverage it for memory-efficient DataFrame operations.
Advertisements
