How to add a Column of Numbers in Bash?


Overview

This article examines how to total up numeric columns of data in a bash shell, looking at the bash tools available for the task and comparing their speed.

Using The awk Tool

We’ll start by calculating the sum of the values in a particular column using the awk (awk) program.

$ awk '{Total=Total+$1} END{print "Total is: " Total}' numbers.csv
Total is: 49471228

Let’s now take a look at the timing using the “time” command −

$ time awk '{Total=Total+$1} END{print "Total is: " Total}' numbers.csv
Total is: 49471228
 
real 0m0.228s
user 0m0.141s
sys 0m0.047s

When The File Contains Multiple Columns

We've seen how to calculate sums across one row of a file using awk. Now let's see how to calculate sums across multiple rows of a file.

$ cat prices.csv
Books,40
Bag,70
Dress,80
Box,10

Here, the file price.csv has two columns. We now want to calculate the sum of the values in the second column.

$ awk -F "," '{Total=Total+$2} END{print "Total is: " Total}' prices.csv
Total is: 200

When The File Contains a Header Line

Sometimes, text or CSV file headers also include a row containing column names. We'll use these column names to help us understand the contents of the file. Let's edit our prices.csv file and add a new row at the top −

$ cat prices.csv
Item,Value
Books,40
Bag,70
Dress,80
Box,10

When the file includes a title, we'll want to remove the title before the text processing occurs. First, let's add an additional field to our input data set called "title". Then, we'll write a script that reads through each record in the file and adds the value of the new field to the total. Finally, we'll print out the results.

$ awk -F "," 'NR!=1{Total=Total+$2} END{print "Total is: " Total}' prices.csv
Total is: 200

We’ll then look at some other methods for adding up numbers in a column, and see how the awk approach compares to them.

Iterating With The Bash Loops

Awk is a powerful tool, but we could also use a for loop to iterate over each value in our columns.

Using the expr Command

We're going to run an experiment and see if we can use the expr function to calculate the total of the numbers inside the for loop.

$ time (sum=0;for number in `cat numbers.csv`; do sum=`expr $sum + $number`; done; echo "Total is: $sum")
Total is: 49471228
 
real 212m48.418s
user 7m19.375s
sys 145m48.203s

Using Arithmetic Expansion

Since the use of “expand” did not help much, we’ll try another approach using arithmetic expression −

$ time (sum=0;for number in `cat numbers.csv`; do sum=$((sum+number)); done; echo "Total is: $sum")
Total is: 49471228
 
real 0m1.961s
user 0m1.813s
sys 0m0.125s

In order to calculate the total, we'll utilize the arithmetic operator. Unlike the expr operator, the $(...) operator can handle both integers and floats.

Adding Values With the bc Command

The bc command performs calculations on expressions consisting of multiple lines. Therefore, we'll have to concatenate the numbers into one line, separating them with an addition operator. Then we'll pass the resulting string to bc to perform the calculation. Here are a few ways to do so.

Using The Paste Command

To start off, we'll take a look at the "PASET" command to arrange the first ten numbers of our data set on one line.

$ cat numbers.csv| head -10 | paste -sd+ -
2+44+6+15+23+0+15+88+82+1

The option –s makes sure that paste prints out each entry on its own line. We also used the +d option to add the “+” symbol as a separator between each entry.

Now that we've covered the basics, let's move on to providing our input to the bc command.

$ time echo "Total is: $(cat numbers.csv | paste -sd+ - | bc)"
Total is: 49471228
 
real 0m0.244s
user 0m0.203s
sys 0m0.063s

Using The tr Command

Let’s use the tr (transliterate) function again to create a new string from an existing one.

$ cat numbers.csv | head -10 |tr "
" "+" 2+44+6+15+23+0+15+88+82+1+

We added an extra zero at the beginning of the string so that we could use the bc command to perform addition. However, notice the extra '+' at the end of the string. To fix this, we can simply append another zero at the end of the line.

$ cat numbers.csv | head -10 |tr "
" "+" ; echo "0" 2+44+6+15+23+0+15+88+82+1+0

Let's redirect the result from the previous command to the bc command.

$ time ((cat numbers.csv | tr "
" "+" ; echo "0") | bc) 49471228   real 0m0.217s user 0m0.203s sys 0m0.031s

Combining tr and bc commands runs faster than using awk.

Using The sed Command

We’ll finally be using the sed command to generate our sequence.

$ cat numbers.csv | head -10 | sed -z 's#
#+#g' 2+44+6+15+23+0+15+88+82+1+

We've replaced new lines ("
") with plus ("+") operators using the search and replacement option of the sed command, and printed zeros at the end to deal wth the extra plus operators.

$ time ((cat numbers.csv | sed -z 's#
#+#g' ; echo "0") | bc) 49471228   real 0m0.343s user 0m0.281s sys 0m0.109s

To change the meaning of the newlines, use the “–z” option. Instead of interpreting the newlines as the start of each input, they will now be treated as the null characters. So, if we replaced the newlines with the plus sign (+), then we would get the following output −

You may want to use sed instead of tr if you're working with large amounts of text.

Conclusion

We looked at different methods for adding up columns of numbers in a bash shell.

We began by discussing the use of the awk command. We then discussed ways to handle files that contain multiple columns or a single column with a heading.

Updated on: 03-Jan-2023

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements