- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Reordering Categories
Reordering categories in a Pandas Categorical Series allows you to define a specific order for categories, which can be useful for sorting and comparison operations.
Generally, Categorical data types are used to represent a fixed number of possible values, such as states, colors, or categories of items. These categories can be ordered or unordered. If the data is ordered, you can perform operations like sorting, min(), max(), and comparisons. If the data is unordered, Pandas raises a TypeError when performing such operations.
Pandas provides two primary methods for reordering categorical data −
Categorical.reorder_categories(): Reorder categories as specified in new_categories. It does not allow adding or removing any data.
Categorical.set_categories(): Modifies the category list, allowing the addition or removal of categories.
In this tutorial, we will learn about the process of reordering categories using these methods, explaining how they work and their limitations.
Reordering Categories with reorder_categories()
The Categorical.reorder_categories() method in Pandas allows you to change the order of the existing categories in a categorical object. This method is used when you want to reorder the categories without adding or removing categories. It only works when all old categories are included in the new categories list, and no new categories can be added.
Example
Here is a simple example to demonstrate how to reorder categories using the Categorical.reorder_categories() method.
import pandas as pd
# Create a categorical Series
s = pd.Series(['low', 'medium', 'high', 'low'], dtype="category")
print("Original Series:")
print(s)
# Reordering the categories
s = s.cat.reorder_categories(['high', 'medium', 'low'], ordered=True)
print("\nSeries after reordering categories:")
print(s)
Following is the output of the above code −
Original Series: 0 low 1 medium 2 high 3 low dtype: category Categories (3, object): ['high', 'low', 'medium'] Series after reordering categories: 0 low 1 medium 2 high 3 low dtype: category Categories (3, object): ['high' < 'medium' < 'low']
Reordering Categories with set_categories()
The Pandas set_categories() method allows you to redefine the categories of a Categorical object, which can be used to reorder the categories, but it also allows you to add or remove categories. Unlike reorder_categories() method, set_categories() method is more flexible, as it does not require the new category list to include all old categories.
Example
This example demonstrates using the set_categories() method for reordering categories of a categorical object.
import pandas as pd
# Create a categorical Series
s = pd.Series(['low', 'medium', 'high', 'low'], dtype="category")
print("Original Series:")
print(s)
# Reorder categories
s = s.cat.set_categories(['high', 'medium', 'low'], ordered=True)
print("\nSeries after reordering categories:")
print(s)
Following is the output of the above code −
Original Series: 0 low 1 medium 2 high 3 low dtype: category Categories (3, object): ['high', 'low', 'medium'] Series after reordering categories: 0 low 1 medium 2 high 3 low dtype: category Categories (3, object): ['high' < 'medium' < 'low']
Handling NaN Values During Reordering
NaN values are treated differently during category reordering. They are not part of the category order and are typically placed at the end when sorted.
Example
This example demonstrates the basic behavior of NaN values during the reordering categorical data.
import pandas as pd
import numpy as np
# Create a Categorical Series with NaN values
data = pd.Categorical([1, 2, 3, np.nan, 1], ordered=True, dtype="category")
s = pd.Series(data)
# Display the Original Series with sorted values
print("Original Series with sorted values")
print(s.sort_values())
# Reorder categories
s = s.cat.reorder_categories([2, 3, 1], ordered=True)
# Sort the values with NaN at the end
s = s.sort_values()
# Display the reordered and Sorted the values
print("\nReordered and Sorted the values:")
print(s.sort_values())
Following is the output of the above code −
Original Series with sorted values 0 1 4 1 1 2 2 3 3 NaN dtype: category Categories (3, int64): [1 < 2 < 3] Reordered and Sorted the values: 1 2 2 3 0 1 4 1 3 NaN dtype: category Categories (3, int64): [2 < 3 < 1]
Common Errors in Reordering Categories
When working with Categorical data, you may encounter common errors related to reordering categories. Which are −
ValueError: This error occurs when you try to reorder categories, but the new categories list does not match all the old categories or includes new categories that were not present before.
TypeError: Operations like min() and max() will not work unless the categorical data is ordered. To use these operations, the categorical data must be ordered.
Example: ValueError
The following example demonstrates occurrence of ValueError when you try to reorder categorical data with a missing category.
import pandas as pd
# Create a categorical Series
s = pd.Series(['low', 'medium', 'high', 'low'], dtype="category")
print("Original Series:")
print(s)
# Try to reorder categories new list
try:
s = s.cat.reorder_categories(['low', 'medium'])
except ValueError as e:
print("\nValueError:", str(e))
When we run above program, it produces following result −
Original Series: 0 low 1 medium 2 high 3 low dtype: category Categories (3, object): ['high', 'low', 'medium'] ValueError: items in new_categories are not the same as in old categories
Example: TypeError with Unordered Categories
The following example demonstrates scenario where the TypeError will occurs, here we are tying to perform the min/max operations on the unordered categorical data.
import pandas as pd
# Create an unordered Categorical series
s = pd.Series([1, 2, 3, 1], dtype="category")
print("Original Series:")
print(s)
# Attempt to use min/max on an unordered series
try:
print(s.min())
except TypeError as e:
print("\nTypeError:", str(e))
When we run above program, it produces following result −
Original Series: 0 1 1 2 2 3 3 1 dtype: category Categories (3, int64): [1, 2, 3] TypeError: Categorical is not ordered for operation min you can use .as_ordered() to change the Categorical to an ordered one