- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Removing Categories
Removing categories from categorical data is useful for removing specified categories from the existing data. The categorical data in Pandas is represented using the Categorical type. It provides specialized methods for handling categorical data through the Series.cat accessor. One such method is remove_categories(), which allows the removal of specified categories from an existing categorical object.
In this tutorial, we will learn about removing specified categories from Pandas categorical data using the remove_categories() method, along with various examples.
The remove_categories() Method
The Pandas remove_categories() method allows you to remove single or multiple categories from an existing Pandas categorical object while maintaining the original data and its order.
Syntax
Following is the syntax of this method −
Series.cat.remove_categories(removals, *args, **kwargs)
The removals is the mandatory parameter. It accepts a single category or a list of categories that should be removed from the existing categorical object. If a category is removed, the values that were part of the removed category will be set to NaN.
Removing a Single Category
You can remove a single category from an existing Pandas categorical object by providing the category to the remove_categories() method.
Example
This example demonstrates how to remove a single category from a categorical Series using the Pandas remove_categories() method. In this example, the category "dog" will be removed from the series. The values at "dog" will be replaced with NaN, and the categories list will also be updated.
import pandas as pd
import numpy as np
# Creating a categorical Series
s = pd.Series(["cat", "dog", "mouse", "cat"], dtype="category")
# Display the Input Series
print("Original Series:")
print(s)
# Removing a category
s = s.cat.remove_categories("dog")
print("\nSeries after removing a category:")
print(s)
When we run above program, it produces following result −
Original Series: 0 cat 1 dog 2 mouse 3 cat dtype: category Categories (3, object): ['cat', 'dog', 'mouse'] Series after removing a category: 0 cat 1 NaN 2 mouse 3 cat dtype: category Categories (2, object): ['cat', 'mouse']
Removing Multiple Categories
You can also remove multiple categories by passing a list of categories to the remove_categories() method.
Example
This example shows how to remove multiple categories from the existing categorical data by providing the list with collection of categories to the removals parameter. Here both the categories "dog" and "mouse" will be removed from the series, and the corresponding values will be replaced with NaN.
import pandas as pd
import numpy as np
# Creating a categorical Series
s = pd.Series(["cat", "dog", "mouse", "cat"], dtype="category")
# Display the Input Series
print("Original Series:")
print(s)
# Removing multiple categories
s = s.cat.remove_categories(["dog", "mouse"])
print("\nSeries after removing multiple categories:")
print(s)
While executing the above code, we get the following output −
Original Series: 0 cat 1 dog 2 mouse 3 cat dtype: category Categories (3, object): ['cat', 'dog', 'mouse'] Series after removing multiple categories: 0 cat 1 NaN 2 NaN 3 cat dtype: category Categories (1, object): ['cat']
Removing Categories from a DataFrame Column
The Pandas cat.remove_categories() method can also be applied to specific columns in a DataFrame. This method works on columns of the category dtype, and it expands the set of categories for that column without modifying the existing data.
Example
This example demonstrates how to remove categories from a specific column of a Pandas DataFrame.
import pandas as pd
# Creating a DataFrame with a categorical column
df = pd.DataFrame({
"Animal": ["Cat", "Dog", "Mouse"],
"Category": pd.Series(["A", "B", "A"], dtype="category")
})
# Display the Input DataFrame
print("Original DataFrame:")
print(df)
# Removing categories from the 'Category' column
df["Category"] = df["Category"].cat.remove_categories(["A"])
# Display the updated DataFrame
print("\nDataFrame after removing categories:")
print(df)
# Checking the updated categories
print("\nUpdated categories in 'Category' column:")
print(df["Category"].cat.categories)
When we run above program, it produces following result −
Original DataFrame: Animal Category 0 Cat A 1 Dog B 2 Mouse A DataFrame after removing categories: Animal Category 0 Cat NaN 1 Dog B 2 Mouse NaN Updated categories in 'Category' column: Index(['B'], dtype='object')
Handling Errors While Removing Categories
If you attempt to remove a category that does not exist in the original categorical object, the method raises a ValueError. This ensures data integrity by preventing the removal of invalid categories.
Example
The following example demonstrates handling exceptions when you try to remove a non-existent category using the remove_categories() method.
import pandas as pd
import numpy as np
# Creating a categorical Series
s = pd.Series(["cat", "dog", "mouse", "cat"], dtype="category")
# Display the Input Series
print("Original Series:")
print(s)
try:
# Attempting to remove a non-existent category
s = s.cat.remove_categories(["elephant"])
except ValueError as e:
print("\nError:", e)
Following is an output of the above code −
Original Series:
0 cat
1 dog
2 mouse
3 cat
dtype: category
Categories (3, object): ['cat', 'dog', 'mouse']
Error: removals must all be in old categories: {'elephant'}