Python - Column summation uneven in sized lists


What is Column Summation

Column summation refers to the process of calculating the sum of values within each column of a dataset or a matrix. In the context of data analysis or numerical computations, column summation is a common operation used to summarize and analyze data along the vertical axis.

For example, consider a dataset represented as a table with rows and columns. Each column corresponds to a variable or a feature, and each row represents an observation or a data point. Column summation involves adding up the values within each column to obtain a single sum for each variable. Let's illustrate this with an example.

Suppose we have the following dataset representing the heights in centimeters of five individuals across three different measurements.

Measurement 1 Measurement 2 Measurement 3
0 170 175 180
1 165 168 172
2 180 182 178
3 172 169 171
4 175 176 174

To calculate the column summation, we have to add up the values within each column.

Column Summation

  • Measurement 1: 862

  • Measurement 2: 870

  • Measurement 3: 875

In this case, the column summation provides the total height for each measurement, giving us an overview of the cumulative values within each variable.

When you have uneven-sized lists and want to calculate the column sum for each column in Python, we can use different approaches. Here are three methods to accomplish this.

Using Loops With Padded Lists

In this approach, we can loop through the lists and sum the values for each column, considering that the lists might have different lengths. We need to pad the shorter lists with zeros to make them equal in length.

Example

In this example, we first find the maximum length among all the lists using `max(len(lst) for lst in lists)`. Then, we pad each list with zeros to match the maximum length using a list comprehension. After padding, we can use `zip(*padded_lists)` to transpose the lists, and finally, we calculate the column sum using another list comprehension.

lists = [
   [1, 2, 3],
   [4, 5],
   [6, 7, 8, 9]
]
max_length = max(len(lst) for lst in lists)
padded_lists = [lst + [0] * (max_length - len(lst)) for lst in lists]
column_sum = [sum(col) for col in zip(*padded_lists)]
print("The column summation of uneven sized lists:",column_sum)

Output

The column summation of uneven sized lists: [11, 14, 11, 9]

Using the Itertools.zip_longest() Function

The `zip_longest` function from the `itertools` module allows us to zip lists with different lengths and fill the missing values with a specified default value 0 in this case.

Example

Here in this example, `zip_longest(*lists, fillvalue=0)` zips the lists with padding, and then we calculate the column sum using a list comprehension, which is similar to the previous approach.

from itertools import zip_longest
lists = [
   [1, 2, 3],
   [4, 5],
   [6, 7, 8, 9, 10]
]
column_sum = [sum(col) for col in zip_longest(*lists, fillvalue=0)]
print("The column summation of uneven sized lists:",column_sum)

Output

The column summation of uneven sized lists: [11, 14, 11, 9, 10]

Using NumPy

NumPy provides an elegant way to handle uneven-sized lists without explicit padding. It automatically broadcasts the lists to perform the sum operation, even when they have different lengths.

Example

In this example, we convert the list of lists into a NumPy array using `np.array(lists)`, where each row represents a list. Then, we use `np.sum(arr, axis=0)` to calculate the sum along the first axis i.e. rows, which effectively gives us the column sum.

import numpy as np
lists = [
   [1, 2, 3],
   [4, 5],
   [6, 7, 8, 9]
]
arr = np.array(lists, dtype=object )
column_sum = np.sum(arr, axis=0)
print("The column summation of uneven sized lists:",column_sum)

Output

The column summation of uneven sized lists: [1, 2, 3, 4, 5, 6, 7, 8, 9]

Updated on: 02-Jan-2024

18 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements