Article Categories

Selected Reading

Python - Column summation uneven in sized lists

Python Server Side Programming Programming

Column summation refers to the process of calculating the sum of values within each column of a dataset or matrix. In Python, this becomes challenging when dealing with uneven-sized lists where columns have different lengths.

What is Column Summation

Column summation involves adding up values within each column to obtain a single sum for each variable. Consider this dataset representing heights in centimeters across three measurements ?

	Measurement 1	Measurement 2	Measurement 3
Person 0	170	175	180
Person 1	165	168	172
Person 2	180	182	178
Person 3	172	169	171
Person 4	175	176	174

The column summation results are ?

Measurement 1: 862 (170 + 165 + 180 + 172 + 175)
Measurement 2: 870 (175 + 168 + 182 + 169 + 176)
Measurement 3: 875 (180 + 172 + 178 + 171 + 174)

Using Manual Padding with Loops

This approach pads shorter lists with zeros to make them equal in length, then calculates column sums ?

lists = [
    [1, 2, 3],
    [4, 5],
    [6, 7, 8, 9]
]

# Find maximum length and pad lists
max_length = max(len(lst) for lst in lists)
padded_lists = [lst + [0] * (max_length - len(lst)) for lst in lists]

# Calculate column sums
column_sum = [sum(col) for col in zip(*padded_lists)]
print("Column summation:", column_sum)

Column summation: [11, 14, 11, 9]

Using itertools.zip_longest()

The zip_longest function automatically handles uneven lists by filling missing values with a specified default ?

from itertools import zip_longest

lists = [
    [1, 2, 3],
    [4, 5],
    [6, 7, 8, 9, 10]
]

# Calculate column sums with automatic padding
column_sum = [sum(col) for col in zip_longest(*lists, fillvalue=0)]
print("Column summation:", column_sum)

Column summation: [11, 14, 11, 9, 10]

Using NumPy with Pandas-style Approach

For better handling of uneven lists, convert each list to a pandas Series and use DataFrame operations ?

import pandas as pd

lists = [
    [1, 2, 3],
    [4, 5],
    [6, 7, 8, 9]
]

# Convert to DataFrame with automatic NaN padding
df = pd.DataFrame(lists)
column_sum = df.sum(axis=0, skipna=True).tolist()
print("Column summation:", column_sum)
print("DataFrame structure:")
print(df)

Column summation: [11.0, 14.0, 11.0, 9.0]
DataFrame structure:
   0  1    2    3
0  1  2  3.0  NaN
1  4  5  NaN  NaN
2  6  7  8.0  9.0

Comparison

Method	Memory Usage	Handles Missing Values	Best For
Manual Padding	High (creates copies)	Zero-fills	Simple cases
`zip_longest`	Low (iterator-based)	Configurable fillvalue	Memory efficiency
Pandas DataFrame	Medium	NaN handling	Complex data analysis

Conclusion

Use zip_longest for memory-efficient column summation of uneven lists. For data analysis tasks, pandas DataFrames provide better NaN handling and additional functionality.

Niharika Aitam

Updated on: 2026-03-27T16:19:19+05:30

311 Views

Previous Next