- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - GroupBy with MultiIndex
MultiIndexed data in Pandas provides more complex indexing using its multiple levels, this multi-leveled data can be particularly useful for representing higher-dimensional data in a two-dimensional format. This hierarchical structure provides a way to group data at different levels.
Pandas groupby() method allows you to work with multiIndex data for aggregation and analysis. When working with hierarchical (MultiIndex) data, this functionality becomes even more flexible, allowing us to group the data by different levels of the index.
In this tutorial, we will learn how to use the GroupBy functionality in Pandas with a MultiIndex DataFrame or Series.
Grouping by Index Levels
To group the data by one of the levels in the MultiIndex, we can use the level parameter in the groupby() method. This allows us to specify which level we want to group by, either by its number (0-based index) or by its name, if names have been assigned to the levels.
Example: Grouping by First Index Level
Here is an example of grouping the MultiIndexed Series object by its first index level.
import pandas as pd
import numpy as np
# Create a 2D list
list_2d = [["BMW", "BMW", "Lexus", "Lexus", "foo", "foo", "Audi", "Audi"],
["1", "2", "1", "2", "1", "2", "1", "2"]]
# Create a MultiIndex object
index = pd.MultiIndex.from_arrays(list_2d, names=["first", "second"])
# Creating a MultiIndexed Series
s = pd.Series(np.random.randn(8), index=index)
# Display the input MultiIndexed Series
print("Input MultiIndexed Series:\n",s)
# Group the Series by the first index level
grouped = s.groupby(level=0)
print("Output Summary of the grouped data:")
print(grouped.sum())
Following is the output of the above code −
Input MultiIndexed Series:
| First | Second | |
|---|---|---|
| BMW | 1 | -0.795467 |
| 2 | -0.132035 | |
| Lexus | 1 | -0.913917 |
| 2 | -0.875364 | |
| foo | 1 | 0.004405 |
| 2 | -0.336840 | |
| Audi | 1 | -0.513719 |
| 2 | 0.588359 |
Grouping by Second Index Level
Similarly to the first index level, we can also group the data by its second index level, for this you can specifying the level name or its index values 1 to the level parameter.
Example
The following example demonstrates grouping the MultiIndex Series object by its second index level.
import pandas as pd
import numpy as np
# Create a 2D list
list_2d = [["BMW", "BMW", "Lexus", "Lexus", "foo", "foo", "Audi", "Audi"],
["1", "2", "1", "2", "1", "2", "1", "2"]]
# Create a MultiIndex object
index = pd.MultiIndex.from_arrays(list_2d, names=["first", "second"])
# Creating a MultiIndexed Series
s = pd.Series(np.random.randn(8), index=index)
# Display the input MultiIndexed Series
print("Input MultiIndexed Series:\n",s)
# Group the Series by the second index level
grouped = s.groupby(level="second")
print("Output Summary of the grouped data:")
print(grouped.sum())
Following is the output of the above code −
Input MultiIndexed Series:
| First | Second | |
|---|---|---|
| BMW | 1 | 1.046440 |
| 2 | -0.895963 | |
| Lexus | 1 | -0.292579 |
| 2 | -0.009580 | |
| foo | 1 | 0.004405 |
| 2 | 1.279683 | |
| Audi | 1 | 0.513284 |
| 2 | -0.250846 |
Grouping by Multiple Index Levels
Pandas allows you to group the MultiIndex data by it more than one index level applying the list of index levels to the level parameter of the groupby() method.
Example
This example groups the MultiIndexed Series object by multiple labels.
import pandas as pd
import numpy as np
# Create data for multi index
data = [["BMW", "BMW", "Lexus", "Lexus", "foo", "foo", "Audi", "Audi"],
["1", "2", "1", "2", "1", "2", "1", "2"],
['red', 'black', 'red', 'black', 'red', 'black', 'red', 'black']]
# Create a MultiIndex object
index = pd.MultiIndex.from_arrays(data, names=["first", "second", "third"])
# Creating a MultiIndexed Series
s = pd.Series(np.random.randn(8), index=index)
# Display the input MultiIndexed Series
print("Input MultiIndexed Series:\n",s)
# Group the Series by the first and third index levels
grouped = s.groupby(level=["first", "third"])
print("Output Summary of the grouped data:")
print(grouped.sum())
Following is the output of the above code −
Input MultiIndexed Series:
| First | Second | Third | |
|---|---|---|---|
| BMW | 1 | red | 0.681079 |
| 2 | black | 0.103199 | |
| Lexus | 1 | red | -1.177623 |
| 2 | black | -1.069462 | |
| foo | 1 | red | 1.015916 |
| 2 | black | -0.548004 | |
| Audi | 1 | red | 0.646248 |
| 2 | black | -1.130859 |
| First | Third | |
|---|---|---|
| Audi | black | -1.130859 |
| red | 0.646248 | |
| BMW | black | 0.103199 |
| red | 0.681079 | |
| Lexus | black | -1.069462 |
| red | -1.177623 | |
| foo | black | -0.548004 |
| red | 1.015916 |
Grouping DataFrame with Index Levels and Columns
A Pandas DataFrame can also be grouped by a combination of index levels and columns. This adds more flexibility in grouping operations, allowing you to aggregate data based on both row indices and column values.
Example
The following example demonstrates grouping the MultiIndexed DataFrame by its index level and column values.
import pandas as pd
import numpy as np
# Create a 2D list
list_2d = [["BMW", "BMW", "Lexus", "Lexus", "foo", "foo", "Audi", "Audi"],
["1", "2", "1", "2", "1", "2", "1", "2"]]
# Create a MultiIndex object
index = pd.MultiIndex.from_arrays(list_2d, names=["first", "second"])
# Creating a MultiIndexed DataFrame
df = pd.DataFrame({"A": [1, 1, 1, 1, 2, 2, 3, 3], "B": np.arange(8)}, index=index)
# Display the input MultiIndexed DataFrame
print("Input MultiIndexed DataFrame:\n")
print(df)
# Group the DataFrame by the second index level and the A column
grouped = df.groupby([pd.Grouper(level=1), "A"])
print("Output Summary of the grouped data:")
print(grouped.sum())
Following is the output of the above code −
Input MultiIndexed DataFrame:
| A | B | ||
|---|---|---|---|
| First | Second | ||
| BMW | 1 | 1 | 0 |
| 2 | 1 | 1 | |
| Lexus | 1 | 1 | 2 |
| 2 | 1 | 3 | |
| foo | 1 | 2 | 4 |
| 2 | 2 | 5 | |
| Audi | 1 | 3 | 6 |
| 2 | 3 | 7 |
| B | ||
|---|---|---|
| Second | A | |
| 1 | 1 | 2 |
| 2 | 4 | |
| 3 | 6 | |
| 2 | 1 | 4 |
| 2 | 5 | |
| 3 | 7 |