Python - Column summation uneven in sized lists

Column summation refers to the process of calculating the sum of values within each column of a dataset or matrix. In Python, this becomes challenging when dealing with uneven-sized lists where columns have different lengths.

What is Column Summation

Column summation involves adding up values within each column to obtain a single sum for each variable. Consider this dataset representing heights in centimeters across three measurements ?

Measurement 1 Measurement 2 Measurement 3
Person 0 170 175 180
Person 1 165 168 172
Person 2 180 182 178
Person 3 172 169 171
Person 4 175 176 174

The column summation results are ?

  • Measurement 1: 862 (170 + 165 + 180 + 172 + 175)
  • Measurement 2: 870 (175 + 168 + 182 + 169 + 176)
  • Measurement 3: 875 (180 + 172 + 178 + 171 + 174)

Using Manual Padding with Loops

This approach pads shorter lists with zeros to make them equal in length, then calculates column sums ?

lists = [
    [1, 2, 3],
    [4, 5],
    [6, 7, 8, 9]
]

# Find maximum length and pad lists
max_length = max(len(lst) for lst in lists)
padded_lists = [lst + [0] * (max_length - len(lst)) for lst in lists]

# Calculate column sums
column_sum = [sum(col) for col in zip(*padded_lists)]
print("Column summation:", column_sum)
Column summation: [11, 14, 11, 9]

Using itertools.zip_longest()

The zip_longest function automatically handles uneven lists by filling missing values with a specified default ?

from itertools import zip_longest

lists = [
    [1, 2, 3],
    [4, 5],
    [6, 7, 8, 9, 10]
]

# Calculate column sums with automatic padding
column_sum = [sum(col) for col in zip_longest(*lists, fillvalue=0)]
print("Column summation:", column_sum)
Column summation: [11, 14, 11, 9, 10]

Using NumPy with Pandas-style Approach

For better handling of uneven lists, convert each list to a pandas Series and use DataFrame operations ?

import pandas as pd

lists = [
    [1, 2, 3],
    [4, 5],
    [6, 7, 8, 9]
]

# Convert to DataFrame with automatic NaN padding
df = pd.DataFrame(lists)
column_sum = df.sum(axis=0, skipna=True).tolist()
print("Column summation:", column_sum)
print("DataFrame structure:")
print(df)
Column summation: [11.0, 14.0, 11.0, 9.0]
DataFrame structure:
   0  1    2    3
0  1  2  3.0  NaN
1  4  5  NaN  NaN
2  6  7  8.0  9.0

Comparison

Method Memory Usage Handles Missing Values Best For
Manual Padding High (creates copies) Zero-fills Simple cases
zip_longest Low (iterator-based) Configurable fillvalue Memory efficiency
Pandas DataFrame Medium NaN handling Complex data analysis

Conclusion

Use zip_longest for memory-efficient column summation of uneven lists. For data analysis tasks, pandas DataFrames provide better NaN handling and additional functionality.

Updated on: 2026-03-27T16:19:19+05:30

176 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements