- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Pandas remove_unused_categories() Method
The remove_unused_categories() method in Pandas is a useful tool for cleaning up categorical data. This method removes categories from a Categorical Series or CategoricalIndex that are not currently being used, resulting a clean data.
This method is a part of Pandas Series.cat accessor (an alias for CategoricalAccessor), specifically designed for categorical data. It is a straightforward and easy-to-use method for cleaning up or transforming the categorical data.
Syntax
Below you can see the syntax of the Python Pandas remove_unused_categories() method, using this method is slightly differs for both the Categorical Series or CategoricalIndex objects.
Syntax for a Pandas Categorical Series −
Series.cat.remove_unused_categories(*args, **kwargs)
Syntax for a CategoricalIndex −
CategoricalIndex.remove_unused_categories(*args, **kwargs)
While calling the remove_unused_categories() method on a Categorical Series, you need to use the .cat accessor. For the CategoricalIndex object you can directly call the method, because the CategoricalIndex is inherently categorical, and methods can be called directly.
Parameters
The Python Pandas remove_unused_categories() method does not take any mandatory parameters. It simply operates on the object it is called on for removing categories that are not used.
Return Value
The Pandas remove_unused_categories() method returns a Categorical object with the unused categories dropped. It will return the same Categorical object if there is no unused category available.
Example: Basic Example
This example demonstrates the basic functionality of removing unused categories from a categorical Series using the Series.cat.remove_unused_categories() method.
import pandas as pd
# Creating a categorical Series with an unused category
s = pd.Series(pd.Categorical(["apple", "banana", "cherry"],
categories=["apple", "banana", "cherry", "grape"]))
print("Original Series:")
print(s)
# Removing unused category
s = s.cat.remove_unused_categories()
print("\nSeries after removing a category:")
print(s)
When we run above program, it produces following result −
Original Series: 0 apple 1 banana 2 cherry dtype: category Categories (4, object): ['apple', 'banana', 'cherry', 'grape'] Series after removing a category: 0 apple 1 banana 2 cherry dtype: category Categories (3, object): ['apple', 'banana', 'cherry']
Example: Grouping and Dropping Unused Categories
When working with grouped data, Series.cat.remove_unused_categories() method helps you to efficiently cleans up empty categories. The following example demonstrates using the Pandas remove_unused_categories() method on a grouped data.
import pandas as pd
# Create a DataFrame with categorical data
cats = pd.Categorical(["a", "b", "b", "b", "c", "c", "c"], categories=["a", "b", "c", "d"])
df = pd.DataFrame({"cats": cats, "values": [1, 2, 2, 2, 3, 4, 5]})
print("Original DataFrame:\n", df)
# Grouping and Dropping Unused Categories
df['cats'] = df['cats'].cat.remove_unused_categories()
# Group by categories and calculate the mean
print("\nGrouped DataFrame with removed unused categories:\n", df.groupby('cats').mean())
While executing the above code we get the following output −
Original DataFrame:
cats values
0 a 1
1 b 2
2 b 2
3 b 2
4 c 3
5 c 4
6 c 5
Grouped DataFrame with removed unused categories:
values
cats
a 1.0
b 2.0
c 4.0
Example: Removing Unused Categories from a CategoricalIndex
The following example demonstrates using the CategoricalIndex.remove_unused_categories() method for removing unused categories from the CategoricalIndex object.
import pandas as pd
# Creating a CategoricalIndex
catIndex = pd.CategoricalIndex(
["p", "q", "r", "p", "q", "r"],
ordered=True,
categories=["p", "q", "r", "s"]
)
print("Original CategoricalIndex:")
print(catIndex)
# Removing unused categories
new_index = catIndex.remove_unused_categories()
print("\nCategoricalIndex after Removing unused categories:")
print(new_index)
Following is an output of the above code −
Original CategoricalIndex: CategoricalIndex(['p', 'q', 'r', 'p', 'q', 'r'], categories=['p', 'q', 'r', 's'], ordered=True, dtype='category') CategoricalIndex after Removing unused categories: CategoricalIndex(['p', 'q', 'r', 'p', 'q', 'r'], categories=['p', 'q', 'r'], ordered=True, dtype='category')