
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Iterating Through Groups
Pandas groupby() method is a powerful tool for grouping data based on one or more keys. Once we have a grouped data(GroupBy object), you can iterate through these groups to perform operations on each group separately.
Iterating through groups works similarly to itertools.groupby(), allowing you to efficiently work with the grouped data. Each iteration provides a tuple where the first element is the group name (key), and the second element is the data corresponding to that group.
In this tutorial we will learn how to iterate through groups in Pandas with single and multiple keys, as well as work with time-based groups.
Iterating Over Single-Key Groups
If you have grouped data with a single key, you can iterate over these groups by that key.
Example
The following example demonstrates grouping data by the single column and iterating through each group. In this example, we group data by the year column and iterate through the groups.
import pandas as pd # Sample dataset of IPL teams ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings', 'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'], 'Rank': [1, 2, 2, 3, 3, 4, 1, 1, 2, 4, 1, 2], 'Year': [2014, 2015, 2014, 2015, 2014, 2015, 2016, 2017, 2016, 2014, 2015, 2017], 'Points': [876, 789, 863, 673, 741, 812, 756, 788, 694, 701, 804, 690] } # Create a DataFrame df = pd.DataFrame(ipl_data) # Group by 'Year' grouped = df.groupby('Year') # Iterate through each group for year, group in grouped: print(f"Group Year: {year}") print(group, end='\n\n')
Following is the output of the above code −
Group Year: 2014 Team Rank Year Points 0 Riders 1 2014 876 2 Devils 2 2014 863 4 Kings 3 2014 741 9 Royals 4 2014 701 Group Year: 2015 Team Rank Year Points 1 Riders 2 2015 789 3 Devils 3 2015 673 5 kings 4 2015 812 10 Royals 1 2015 804 Group Year: 2016 Team Rank Year Points 6 Kings 1 2016 756 8 Riders 2 2016 694 Group Year: 2017 Team Rank Year Points 7 Kings 1 2017 788 11 Riders 2 2017 690
By default, the group name in each iteration is the value of the column used for grouping, which in this case is the Year. This allows you to identify the group easily.
Iterating Grouped data with Multiple Keys
You can also group by multiple columns, which creates tuples as group names containing values from each column. When grouping by multiple columns, the group name is a tuple like ('Devils', 2014), representing both the team and the year.
Example
The following example groups the data by 2 columns and iterates through each group.
import pandas as pd # Sample dataset of IPL teams ipl_data = { 'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings', 'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'], 'Rank': [1, 2, 2, 3, 3, 4, 1, 1, 2, 4, 1, 2], 'Year': [2014, 2015, 2014, 2015, 2014, 2015, 2016, 2017, 2016, 2014, 2015, 2017], 'Points': [876, 789, 863, 673, 741, 812, 756, 788, 694, 701, 804, 690] } # Create a DataFrame df = pd.DataFrame(ipl_data) # Group by 'Team' and 'Year' multi_grouped = df.groupby(['Team', 'Year']) # Iterate through each group for name, group in multi_grouped: print(f"Group: {name}") print(group)
Following is the output of the above code −
Group: ('Devils', 2014) Team Rank Year Points 2 Devils 2 2014 863 Group: ('Devils', 2015) Team Rank Year Points 3 Devils 3 2015 673 Group: ('Kings', 2014) Team Rank Year Points 4 Kings 3 2014 741 Group: ('Kings', 2016) Team Rank Year Points 6 Kings 1 2016 756 Group: ('Kings', 2017) Team Rank Year Points 7 Kings 1 2017 788 Group: ('Riders', 2014) Team Rank Year Points 0 Riders 1 2014 876 Group: ('Riders', 2015) Team Rank Year Points 1 Riders 2 2015 789 Group: ('Riders', 2016) Team Rank Year Points 8 Riders 2 2016 694 Group: ('Riders', 2017) Team Rank Year Points 11 Riders 2 2017 690 Group: ('Royals', 2014) Team Rank Year Points 9 Royals 4 2014 701 Group: ('Royals', 2015) Team Rank Year Points 10 Royals 1 2015 804 Group: ('kings', 2015) Team Rank Year Points 5 kings 4 2015 812
Iterating Over Time-Based Groups
For time series data, you can group by time intervals using the Pandas DataFrame.resample() method, which allows you can group data by time intervals, such as minutes, days, or months. This is especially useful for analyzing data over time periods.
Example
This example demonstrates grouping time series data by minutes intervals using the Pandas DataFrame.resample() method.
import pandas as pd # Create a time series time_series = pd.Series(range(6), index=pd.to_datetime(["2024-01-01 00:00:00", "2024-01-01 00:30:00", "2024-01-01 00:31:00", "2024-01-01 01:00:00", "2024-01-01 03:00:00", "2024-01-01 03:05:00" ])) # Resample the series by hour resampled = time_series.resample("30min") # Iterate through each resampled group for time, group in resampled: print(f"Group: {time}") print(group, end="\n\n")
Following is the output of the above code −
Group: 2024-01-01 00:00:00 2024-01-01 0 dtype: int64 Group: 2024-01-01 00:30:00 2024-01-01 00:30:00 1 2024-01-01 00:31:00 2 dtype: int64 Group: 2024-01-01 01:00:00 2024-01-01 01:00:00 3 dtype: int64 Group: 2024-01-01 01:30:00 Series([], dtype: int64) Group: 2024-01-01 02:00:00 Series([], dtype: int64) Group: 2024-01-01 02:30:00 Series([], dtype: int64) Group: 2024-01-01 03:00:00 2024-01-01 03:00:00 4 2024-01-01 03:05:00 5 dtype: int64