Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Pandas - Filtering few rows from a DataFrame on the basis of sum
In Pandas, you can filter DataFrame rows based on the sum of values across columns. This is useful when you need to select rows where the total meets specific criteria, such as student marks where the combined score exceeds a threshold.
Creating the DataFrame
Let's start by creating a DataFrame with student marks across different subjects ?
import pandas as pd
# Create a DataFrame with 3 students' marks
dataFrame = pd.DataFrame({
'Jacob_Marks': [95, 90, 70, 85, 88],
'Ted_Marks': [60, 50, 65, 85, 70],
'Jamie_Marks': [77, 76, 60, 45, 50]
})
print("Original DataFrame:")
print(dataFrame)
Original DataFrame: Jacob_Marks Ted_Marks Jamie_Marks 0 95 60 77 1 90 50 76 2 70 65 60 3 85 85 45 4 88 70 50
Filtering Rows Based on Sum
We can filter rows where the total marks (sum across all students) exceeds 200 using the sum(axis=1) method ?
import pandas as pd
# Create a DataFrame with 3 students' marks
dataFrame = pd.DataFrame({
'Jacob_Marks': [95, 90, 70, 85, 88],
'Ted_Marks': [60, 50, 65, 85, 70],
'Jamie_Marks': [77, 76, 60, 45, 50]
})
print("Original DataFrame:")
print(dataFrame)
print("\nRow sums:")
print(dataFrame.sum(axis=1))
# Filter rows with total greater than 200
filtered_dataFrame = dataFrame[dataFrame.sum(axis=1) > 200]
print("\nFiltered DataFrame (sum > 200):")
print(filtered_dataFrame)
Original DataFrame: Jacob_Marks Ted_Marks Jamie_Marks 0 95 60 77 1 90 50 76 2 70 65 60 3 85 85 45 4 88 70 50 Row sums: 0 232 1 216 2 195 3 215 4 208 dtype: int64 Filtered DataFrame (sum > 200): Jacob_Marks Ted_Marks Jamie_Marks 0 95 60 77 1 90 50 76 3 85 85 45 4 88 70 50
How It Works
The sum(axis=1) method calculates the sum across columns for each row. The parameter axis=1 specifies row−wise operation, while axis=0 would sum column−wise. The boolean condition creates a mask that filters rows meeting the criteria.
Alternative Filtering Examples
You can apply different sum−based filters depending on your requirements ?
import pandas as pd
dataFrame = pd.DataFrame({
'Jacob_Marks': [95, 90, 70, 85, 88],
'Ted_Marks': [60, 50, 65, 85, 70],
'Jamie_Marks': [77, 76, 60, 45, 50]
})
# Filter rows with sum exactly equal to 216
exact_sum = dataFrame[dataFrame.sum(axis=1) == 216]
print("Rows with sum exactly 216:")
print(exact_sum)
# Filter rows with sum between 200 and 220
range_filter = dataFrame[(dataFrame.sum(axis=1) >= 200) & (dataFrame.sum(axis=1) <= 220)]
print("\nRows with sum between 200 and 220:")
print(range_filter)
Rows with sum exactly 216: Jacob_Marks Ted_Marks Jamie_Marks 1 90 50 76 Rows with sum between 200 and 220: Jacob_Marks Ted_Marks Jamie_Marks 1 90 50 76 3 85 85 45 4 88 70 50
Conclusion
Use dataFrame.sum(axis=1) to calculate row−wise sums, then apply boolean indexing to filter rows. This technique is valuable for analyzing data where aggregate values determine record relevance.
