- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Unioning Categorical Data
Unioning categorical data refers to the process of combining multiple categorical Series or DataFrame objects into a single set while merging their categories. This operation is useful when combining categories from different data sources, and to handle scenarios where the categories do not exactly match.
In the Concatenating Categorical Data tutorial, we have seen some uncertainties in the memory management. Here, we will learn how to use the union_categoricals() function for consistent category management while unioning/combining the categorical data.
The union_categoricals() Function
The union_categoricals() function from pandas.api.types is used to combine multiple categorical data types into a single category. The resulting categories will be the union of all the categories from the data involved.
Syntax
Following is the syntax of this function −
pandas.api.types.union_categoricals(to_union, sort_categories=False, ignore_order=False)
Where,
to_union: List of Categorical, CategoricalIndex, or Series with dtype='category'.
sort_categories: It is a boolean parameter, if set to true, the resulting categories will be lexsorted. Otherwise, they remain in their original order.
ignore_order: If true, the ordered attribute of the categoricals is ignored. The result becomes an unordered categorical.
Example
Here is a basic example demonstrating how to merge different categorical data using the pandas.api.types.union_categoricals() function. In this example, Series s1 has categories 'cat' and 'dog', while s2 has 'cat', 'mouse' and 'dog'. The union_categoricals() function merges these categories into a single set, 'cat', 'dog', and 'mouse'.
import pandas as pd
from pandas.api.types import union_categoricals
# Creating categorical Series
s1 = pd.Series(["cat", "dog"], dtype="category")
s2 = pd.Series(["cat", "mouse", 'dog'], dtype="category")
# Display the Input Series objects
print("Input Series 1:")
print(s1)
print("\nInput Series 2:")
print(s2)
# Unioning the categorical Series
result = union_categoricals([s1, s2])
print("\nSeries after Unioning the categorical Series':")
print(result)
When we run above program, it produces following result −
Input Series 1: 0 cat 1 dog dtype: category Categories (2, object): ['cat', 'dog'] Input Series 2: 0 cat 1 mouse 2 dog dtype: category Categories (3, object): ['cat', 'dog', 'mouse'] Series after Unioning the categorical Series': ['cat', 'dog', 'cat', 'mouse', 'dog'] Categories (3, object): ['cat', 'dog', 'mouse']
Unioning and Sorting Categorical Data
By default, the categories in the resulting union are ordered as they appear in the data. However, if you want the categories to be sorted lexsorted, you can pass sort_categories=True parameter.
Example
The following example demonstrates unioning and sorting the categorical data using the union_categoricals() method with the sort_categories=True parameter.
import pandas as pd
from pandas.api.types import union_categoricals
# Creating categorical Series
s1 = pd.Series(["cat", "dog"], dtype="category")
s2 = pd.Series(["cat", "mouse", 'dog'], dtype="category")
# Display the Input Series objects
print("Input Series 1:")
print(s1)
print("\nInput Series 2:")
print(s2)
# Unioning with sorted categories
result = union_categoricals([s1, s2], sort_categories=True)
print("\nSeries after Unioning and Sorting the categorical Series':")
print(result)
Following is an output of the above code −
Input Series 1: 0 cat 1 dog dtype: category Categories (2, object): ['cat', 'dog'] Input Series 2: 0 cat 1 mouse 2 dog dtype: category Categories (3, object): ['cat', 'dog', 'mouse'] Series after Unioning and Sorting the categorical Series': ['cat', 'dog', 'cat', 'mouse', 'dog'] Categories (3, object): ['cat', 'dog', 'mouse']
Unioning Ordered Categorical Data
The union_categoricals() function works easily for combining ordered categorical data with identical categories. If the categories are not identical, then a TypeError will be raised.
Example
The following example shows how the union_categoricals() function combines ordered categorical Series seamlessly.
import pandas as pd
from pandas.api.types import union_categoricals
# Creating categorical Series
a = pd.Categorical(["cat", "dog"], ordered=True)
b = pd.Categorical(["cat", 'dog', "cat"], ordered=True)
s1 = pd.Series(a)
s2 = pd.Series(b)
# Display the Input Series objects
print("Input Series 1:")
print(s1)
print("\nInput Series 2:")
print(s2)
# Unioning ordered categoricals
result = union_categoricals([s1, s2])
print("\nSeries after Unioning the ordered categorical Series':")
print(result)
While executing the above code we get the following output −
Input Series 1: 0 cat 1 dog dtype: category Categories (2, object): ['cat' < 'dog'] Input Series 2: 0 cat 1 dog 2 cat dtype: category Categories (2, object): ['cat' < 'dog'] Series after Unioning the ordered categorical Series': ['cat', 'dog', 'cat', 'dog', 'cat'] Categories (2, object): ['cat' < 'dog']
Handling Different Orders while Unioning
If you try to union two ordered categorical variables with different categories, a TypeError is raised. To avoid this exception, you can use the ignore_order=True argument, which allows the union to proceed even if the order of categories differs.
Example
This example demonstrates unioning the two categorical data that have different categories by handling the TypeError exception.
import pandas as pd
from pandas.api.types import union_categoricals
# Ordered categoricals with different categories
a = pd.Categorical(["cat", "dog"], ordered=True)
b = pd.Categorical(["cat", 'mouse'], ordered=True)
s1 = pd.Series(a)
s2 = pd.Series(b)
# Display the Input Series objects
print("Input Series 1:")
print(s1)
print("\nInput Series 2:")
print(s2)
# Handling exception while unioning with different ordered categories
try:
result = union_categoricals([a, b])
except TypeError as e:
print("\nError:", e)
# Ignoring order to union
result = union_categoricals([a, b], ignore_order=True)
print("\nSeries after Unioning the different ordered categorical's':")
print(result)
When we run above program, it produces following result −
Input Series 1: 0 cat 1 dog dtype: category Categories (2, object): ['cat' < 'dog'] Input Series 2: 0 cat 1 mouse dtype: category Categories (2, object): ['cat' < 'mouse'] Error: to union ordered Categoricals, all categories must be the same Ignoring order to union: ['cat', 'dog', 'cat', 'mouse'] Categories (3, object): ['cat', 'dog', 'mouse']