- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Setting Categories
Setting categories in a categorical data means modifying the structure of categorical data by performing one or more of the following operations −
Add new categories.
Reorder existing categories.
Rename categories.
Remove categories.
Categorical data is common in data analysis, which represents fixed or discrete values. Managing these categories effectively is crucial for data consistency and performance. In Pandas, the set_categories() method provides an efficient way to add, remove, reorder, or rename categories in a Pandas Series, DataFrame columns, or CategoricalIndex objects.
In this tutorial, we will learn about setting categories to the Pandas categorical data using the set_categories() method with the detailed examples.
The set_categories() Method
The Pandas set_categories() method is a part of Pandas Series.cat accessor (an alias for CategoricalAccessor), designed specifically for categorical data. It allows adding, removing, renaming, or reordering categories in a categorical Series or CategoricalIndex. By combining multiple operations in a single call, it simplifies categorical data management.
Syntax
Following is the syntax of this method −
set_categories(new_categories, ordered=False, rename=False, *args, **kwargs)
The new_categories parameter accepts list-like object, specifies the new set of categories to be assigned. The rename parameter takes a boolean (by default False) determines whether categories should be renamed without altering the existing data. Additionally, the ordered parameter is used to define whether the categories should follow a specific order or not.
Setting New Categories
Setting new categories involves defining a new set of categories, which may include unused categories. If any value in the original data is not listed in the new set of categories, it is replaced with NaN. This method can simultaneously add and remove categories based on the provided list.
Example
This example demonstrates setting the new categories to the categorical Series using the Pandas set_categories() method.
import pandas as pd
import numpy as np
# Creating a categorical Series
s = pd.Series(["cat", "dog", "cat"], dtype="category")
# Display the Input Series
print("Original Series:")
print(s)
# Setting new categories
s = s.cat.set_categories(["cat", "dog", "mouse"])
print("\nSeries after setting new categories:")
print(s)
When we run above program, it produces following result −
Original Series: 0 cat 1 dog 2 cat dtype: category Categories (2, object): ['cat', 'dog'] Series after setting new categories: 0 cat 1 dog 2 cat dtype: category Categories (3, object): ['cat', 'dog', 'mouse']
Reordering Categories
Reordering categories can simplify data interpretation and sorting operations. This can be done by providing a list of categories to the new_categries parameter and using the ordered=True parameter of the set_categories() method.
Example
This example demonstrates how to reorder categories of a Pandas categorical series object using the ordered=True parameter of the set_categories() method.
import pandas as pd
# Create a categorical Series
data = pd.Categorical(['low', 'medium', 'high'], categories=['low', 'medium', 'high'], ordered=True)
s = pd.Series(data)
print("Original Series:")
print(s)
# Reorder categories
s = s.cat.set_categories(['high', 'medium', 'low'], ordered=True)
print("\nSeries after reordering categories:")
print(s)
When we run above program, it produces following result −
Original Series: 0 low 1 medium 2 high dtype: category Categories (3, object): ['low' < 'medium' < 'high'] Series after reordering categories: 0 low 1 medium 2 high dtype: category Categories (3, object): ['high' < 'medium' < 'low']
Renaming Categories
The rename=True parameter of the ser_categories() method changes the names of the existing categories without adding or removing them. This is useful for standardizing inconsistent labels without modifying the original structure.
Example
This example shows how to rename categories of an existing categorical data by providing the list with new names and the rename=True parameter.
import pandas as pd
import numpy as np
# Creating a categorical Series
s = pd.Series(["cat", "dog", "mouse", "cat"], dtype="category")
# Display the Input Series
print("Original Series:")
print(s)
# setting new categories
s = s.cat.set_categories(["Animal-1", "Animal-2", "Animal-3"], rename=True)
print("\nSeries After Renaming Categories:")
print(s)
While executing the above code we get the following output −
Original Series: 0 cat 1 dog 2 mouse 3 cat dtype: category Categories (3, object): ['cat', 'dog', 'mouse'] Series After Renaming Categories: 0 Animal-1 1 Animal-2 2 Animal-3 3 Animal-1 dtype: category Categories (3, object): ['Animal-1', 'Animal-2', 'Animal-3']
Removing Categories
You can remove specific categories by defining a new set of categories that excludes them. Any removed categories will result in corresponding values being replaced by NaN.
Example
This example demonstrates removing categories using the set_categories() method.
import pandas as pd
# Creating a categorical Series
s = pd.Series(["small", "medium", "large"], dtype="category")
print("Original Series:")
print(s)
# Removing a category
s = s.cat.set_categories(["small", "large"])
print("\nSeries After Removing a Category:")
print(s)
When we run above program, it produces following result −
Original Series: 0 small 1 medium 2 large dtype: category Categories (3, object): ['large', 'medium', 'small'] Series After Removing a Category: 0 small 1 NaN 2 large dtype: category Categories (2, object): ['small', 'large']