- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Removing Unused Categories
Removing unused categories from categorical data is useful for cleaning and optimizing datasets. In pandas, categorical data is a powerful tool for managing data with fixed, limited values and represented using the Categorical type. It provides specialized methods for handling categorical data through the Series.cat accessor. One such method is remove_unused_categories(), which removes unused categories from a categorical object.
In this tutorial, we will learn about Removing Unused categories to the Pandas categorical data using its related functionalities with the various examples.
The remove_unused_categories() Method
The Pandas Series.cat.remove_unused_categories() method removes categories that are not used in the data from a Pandas categorical object while maintaining its original data and order.
Syntax
Following is the syntax of this method −
Series.cat.remove_unused_categories(*args, **kwargs)
This method does not require any mandatory parameters and removes only those categories that are not present in the data.
Removing Unused Categories from a Series
You can remove the unused categories from a Pandas categorical series object directly by using the remove_unused_categories() method.
Example
This example demonstrates how to remove unused categories from a categorical Series. using the Pandas Series.cat.remove_unused_categories() method.
import pandas as pd
# Creating a categorical Series
s = pd.Series(["cat", "dog", "cat"], dtype="category")
s = s.cat.add_categories(["mouse", "elephant"])
print("Original Series:")
print(s)
# Removing unused categories
s = s.cat.remove_unused_categories()
print("\nSeries after removing unused categories:")
print(s)
When we run above program, it produces following result −
Original Series: 0 cat 1 dog 2 cat dtype: category Categories (4, object): ['cat', 'dog', 'mouse', 'elephant'] Series after removing unused categories: 0 cat 1 dog 2 cat dtype: category Categories (2, object): ['cat', 'dog']
Removing Unused Categories from a DataFrame Column
You can also remove unused categories from a DataFrame column using the cat.remove_unused_categories() method.
Example
This example demonstrates how to remove unused categories from a specific column in a DataFrame.
import pandas as pd
import pandas as pd
# Creating a DataFrame with a categorical column
df = pd.DataFrame({"Animal": ["Cat", "Dog", "Mouse"],
"Category": pd.Series(["A", "B", "A"], dtype="category")
})
# Add extra categories
df["Category"] = df["Category"].cat.add_categories(["C", "D"])
print("Original DataFrame:")
print(df['Category'].cat.categories)
# Removing unused categories from the 'Category' column
df["Category"] = df["Category"].cat.remove_unused_categories()
print("\nDataFrame after removing unused categories:")
print(df)
# Checking the updated categories
print("\nUpdated categories in 'Category' column:")
print(df["Category"].cat.categories)
While executing the above code we get the following output −
Original DataFrame: Index(['A', 'B', 'C', 'D'], dtype='object') DataFrame after removing unused categories:
| Animal | Category | |
|---|---|---|
| 0 | Cat | A |
| 1 | Dog | B |
| 2 | Mouse | A |
Removing Unused Categories with groupby
Unused categories in categorical data can also be dropped while performing groupby operations. This approach is particularly useful when you need to aggregate data based on the reduced set of categories.
Example
This example demonstrates how to remove unused categories to a specific column in a DataFrame and applying the grouping operation.
import pandas as pd
# Creating a DataFrame with a categorical column
df = pd.DataFrame({
"Value": [10, 15, 10, 20],
"Category": pd.Categorical(["A", "B", "A", "C"], categories=["A", "B", "C", "D"])
})
# Display the input DataFrame
print("Original DataFrame:")
print(df)
# Removing unused categories
df['Category'] = df['Category'].cat.remove_unused_categories()
# Grouping by 'Category'
grouped = df.groupby('Category').mean()
# Display the grouped DataFrame
print("\nGrouped DataFrame after removing unused categories:")
print(grouped)
When we run above program, it produces following result −
Original DataFrame:
| Value | Category | |
|---|---|---|
| 0 | 10 | A |
| 1 | 15 | B |
| 2 | 10 | A |
| 3 | 20 | C |