Python Pandas - Filtering columns from a DataFrame on the basis of sum

In Pandas, you can filter DataFrame columns based on their sum values using the loc[] indexer with conditional logic. This technique is useful when you want to select only columns that meet certain aggregate criteria.

Creating the DataFrame

First, let's create a sample DataFrame with student marks ?

import pandas as pd

# Create a DataFrame with student marks
df = pd.DataFrame({
    'Jacob_Marks': [95, 90, 75, 85, 88],
    'Ted_Marks': [60, 50, 65, 85, 70],
    'Jamie_Marks': [77, 76, 65, 45, 50]
})

print("Original DataFrame:")
print(df)
Original DataFrame:
   Jacob_Marks  Ted_Marks  Jamie_Marks
0           95         60           77
1           90         50           76
2           75         65           65
3           85         85           45
4           88         70           50

Checking Column Sums

Let's examine the sum of each column to understand our filtering criteria ?

import pandas as pd

df = pd.DataFrame({
    'Jacob_Marks': [95, 90, 75, 85, 88],
    'Ted_Marks': [60, 50, 65, 85, 70],
    'Jamie_Marks': [77, 76, 65, 45, 50]
})

print("Column sums:")
print(df.sum(axis=0))
Column sums:
Jacob_Marks    433
Ted_Marks      330
Jamie_Marks    313
dtype: int64

Filtering Columns by Sum

Now we'll filter to keep only columns where the total sum is greater than 400 ?

import pandas as pd

df = pd.DataFrame({
    'Jacob_Marks': [95, 90, 75, 85, 88],
    'Ted_Marks': [60, 50, 65, 85, 70],
    'Jamie_Marks': [77, 76, 65, 45, 50]
})

print("Original DataFrame:")
print(df)

# Filter columns where sum > 400
filtered_df = df.loc[:, df.sum(axis=0) > 400]

print("\nFiltered DataFrame (columns with sum > 400):")
print(filtered_df)
Original DataFrame:
   Jacob_Marks  Ted_Marks  Jamie_Marks
0           95         60           77
1           90         50           76
2           75         65           65
3           85         85           45
4           88         70           50

Filtered DataFrame (columns with sum > 400):
   Jacob_Marks
0           95
1           90
2           75
3           85
4           88

How It Works

The filtering process works in three steps:

  • df.sum(axis=0) calculates the sum for each column
  • df.sum(axis=0) > 400 creates a boolean Series indicating which columns meet the criteria
  • df.loc[:, boolean_series] selects all rows and only the columns where the condition is True

Alternative Approach

You can also store the condition separately for better readability ?

import pandas as pd

df = pd.DataFrame({
    'Jacob_Marks': [95, 90, 75, 85, 88],
    'Ted_Marks': [60, 50, 65, 85, 70],
    'Jamie_Marks': [77, 76, 65, 45, 50]
})

# Create boolean condition
condition = df.sum(axis=0) > 400
print("Condition (sum > 400):")
print(condition)

# Apply the filter
result = df.loc[:, condition]
print("\nFiltered result:")
print(result)
Condition (sum > 400):
Jacob_Marks     True
Ted_Marks      False
Jamie_Marks    False
dtype: bool

Filtered result:
   Jacob_Marks
0           95
1           90
2           75
3           85
4           88

Conclusion

Use df.loc[:, df.sum(axis=0) > threshold] to filter DataFrame columns based on their sum values. This approach combines column aggregation with boolean indexing for efficient data filtering.

Updated on: 2026-03-26T02:14:47+05:30

944 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements