Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas - GroupBy with MultiIndex

MultiIndexed data in Pandas provides more complex indexing using its multiple levels, this multi-leveled data can be particularly useful for representing higher-dimensional data in a two-dimensional format. This hierarchical structure provides a way to group data at different levels.

Pandas groupby() method allows you to work with multiIndex data for aggregation and analysis. When working with hierarchical (MultiIndex) data, this functionality becomes even more flexible, allowing us to group the data by different levels of the index.

In this tutorial, we will learn how to use the GroupBy functionality in Pandas with a MultiIndex DataFrame or Series.

Grouping by Index Levels

To group the data by one of the levels in the MultiIndex, we can use the level parameter in the groupby() method. This allows us to specify which level we want to group by, either by its number (0-based index) or by its name, if names have been assigned to the levels.

Example: Grouping by First Index Level

Here is an example of grouping the MultiIndexed Series object by its first index level.

import pandas as pd
import numpy as np

# Create a 2D list
list_2d = [["BMW", "BMW", "Lexus", "Lexus", "foo", "foo", "Audi", "Audi"],
["1", "2", "1", "2", "1", "2", "1", "2"]]

# Create a MultiIndex object
index = pd.MultiIndex.from_arrays(list_2d, names=["first", "second"])

# Creating a MultiIndexed Series
s = pd.Series(np.random.randn(8), index=index)

# Display the input MultiIndexed Series 
print("Input MultiIndexed Series:\n",s)

# Group the Series by the first index level
grouped = s.groupby(level=0)

print("Output Summary of the grouped data:")
print(grouped.sum())

Following is the output of the above code −

Input MultiIndexed Series:

First	Second
BMW	1	-0.795467
BMW	2	-0.132035
Lexus	1	-0.913917
Lexus	2	-0.875364
foo	1	0.004405
foo	2	-0.336840
Audi	1	-0.513719
Audi	2	0.588359

dtype: float64 Output Summary of the grouped data: first Audi -0.406670 BMW -0.927503 Lexus -2.744018 foo -0.332435 dtype: float64

Grouping by Second Index Level

Similarly to the first index level, we can also group the data by its second index level, for this you can specifying the level name or its index values 1 to the level parameter.

Example

The following example demonstrates grouping the MultiIndex Series object by its second index level.

import pandas as pd
import numpy as np

# Create a 2D list
list_2d = [["BMW", "BMW", "Lexus", "Lexus", "foo", "foo", "Audi", "Audi"],
["1", "2", "1", "2", "1", "2", "1", "2"]]

# Create a MultiIndex object
index = pd.MultiIndex.from_arrays(list_2d, names=["first", "second"])

# Creating a MultiIndexed Series
s = pd.Series(np.random.randn(8), index=index)

# Display the input MultiIndexed Series 
print("Input MultiIndexed Series:\n",s)

# Group the Series by the second index level
grouped = s.groupby(level="second")

print("Output Summary of the grouped data:")
print(grouped.sum())

Following is the output of the above code −

Input MultiIndexed Series:

First	Second
BMW	1	1.046440
BMW	2	-0.895963
Lexus	1	-0.292579
Lexus	2	-0.009580
foo	1	0.004405
foo	2	1.279683
Audi	1	0.513284
Audi	2	-0.250846

dtype: float64 Output Summary of the grouped data: second 1 1.238211 2 0.123295 dtype: float64

Grouping by Multiple Index Levels

Pandas allows you to group the MultiIndex data by it more than one index level applying the list of index levels to the level parameter of the groupby() method.

Example

This example groups the MultiIndexed Series object by multiple labels.

import pandas as pd
import numpy as np

# Create data for multi index
data = [["BMW", "BMW", "Lexus", "Lexus", "foo", "foo", "Audi", "Audi"],
["1", "2", "1", "2", "1", "2", "1", "2"], 
['red', 'black', 'red', 'black', 'red', 'black', 'red', 'black']]

# Create a MultiIndex object
index = pd.MultiIndex.from_arrays(data, names=["first", "second", "third"])

# Creating a MultiIndexed Series
s = pd.Series(np.random.randn(8), index=index)

# Display the input MultiIndexed Series 
print("Input MultiIndexed Series:\n",s)

# Group the Series by the first and third index levels
grouped = s.groupby(level=["first", "third"])

print("Output Summary of the grouped data:")
print(grouped.sum())

Following is the output of the above code −

Input MultiIndexed Series:

First	Second	Third
BMW	1	red	0.681079
BMW	2	black	0.103199
Lexus	1	red	-1.177623
Lexus	2	black	-1.069462
foo	1	red	1.015916
foo	2	black	-0.548004
Audi	1	red	0.646248
Audi	2	black	-1.130859

dtype: float64 Output Summary of the grouped data:

First	Third
Audi	black	-1.130859
Audi	red	0.646248
BMW	black	0.103199
BMW	red	0.681079
Lexus	black	-1.069462
Lexus	red	-1.177623
foo	black	-0.548004
foo	red	1.015916

dtype: float64

Grouping DataFrame with Index Levels and Columns

A Pandas DataFrame can also be grouped by a combination of index levels and columns. This adds more flexibility in grouping operations, allowing you to aggregate data based on both row indices and column values.

Example

The following example demonstrates grouping the MultiIndexed DataFrame by its index level and column values.

import pandas as pd
import numpy as np

# Create a 2D list
list_2d = [["BMW", "BMW", "Lexus", "Lexus", "foo", "foo", "Audi", "Audi"],
["1", "2", "1", "2", "1", "2", "1", "2"]]

# Create a MultiIndex object
index = pd.MultiIndex.from_arrays(list_2d, names=["first", "second"])

# Creating a MultiIndexed DataFrame
df = pd.DataFrame({"A": [1, 1, 1, 1, 2, 2, 3, 3], "B": np.arange(8)}, index=index)

# Display the input MultiIndexed DataFrame
print("Input MultiIndexed DataFrame:\n")
print(df)

# Group the DataFrame by the second index level and the A column
grouped = df.groupby([pd.Grouper(level=1), "A"])

print("Output Summary of the grouped data:")
print(grouped.sum())

Following is the output of the above code −

Input MultiIndexed DataFrame:

		A	B
First	Second
BMW	1	1	0
BMW	2	1	1
Lexus	1	1	2
Lexus	2	1	3
foo	1	2	4
foo	2	2	5
Audi	1	3	6
Audi	2	3	7

Output Summary of the grouped data:

		B
Second	A
1	1	2
	2	4
	3	6
2	1	4
	2	5
	3	7

Print Page