How to find the column that has the largest sum in R?



To find the column that has the largest sum, we can use sort function for sorting in decreasing order with colSums and accessing the first element of the output which will be the largest sum. For example, if we have a data frame called df that contains multiple columns then the column that has the largest sum can be found by using the command −

str(sort(colSums(df[,1:length(df)]),decreasing=TRUE)[1])

Example1

Consider the below data frame −

Live Demo

> x1<-rpois(20,5)
> x2<-rpois(20,5)
> x3<-rpois(20,5)
> x4<-rpois(20,5)
> df1<-data.frame(x1,x2,x3,x4)
> df1

Output

   x1 x2 x3 x4 1   3  4  4  5 2   6 10  3  3 3   6  5  2  5 4   7  6  2 13 5   4  7  7  3 6   2  4  3  4 7   5  7  2  2 8   1  2  8  3 9  10  1  3  2 10  6  4  8  5 11  6  7  2  2 12  6  3  4  6 13  8  6  8  5 14  4  6  1  6 15  3  1  7 10 16  4  3  6  8 17  1  1  8  8 18  6  6  5  6 19  7  3  2  6 20  6  6  4  5

Finding the column that has the largest sum in df1 −

> str(sort(colSums(df1[,1:length(df1)]),decreasing=TRUE)[1])

Output

Named num 107
- attr(*, "names")= chr "x4"

Example2

Live Demo

> y1<-rnorm(20)
> y2<-rnorm(20)
> y3<-rnorm(20)
> df2<-data.frame(y1,y2,y3)
> df2

Output

            y1          y2          y3
1  -0.67247167 -0.03504090 -0.66697231
2  -0.68074045 -0.25805863  0.84996560
3   0.69900478 -1.88632900 -0.72983709
4  -1.18607010  1.41421023  1.13006070
5  -0.32133261 -0.63577768 -0.11396980
6  -1.32619037  0.61646926  0.89315793
7   0.01712191 -1.07839179 -0.34707437
8   0.16517472 -0.80356200  0.37064564
9   2.52589496 -0.37596219 -0.36734004
10 -0.14817698 -0.11656378 -2.23320356
11 -0.53926289  0.21150137 -0.20352309
12  0.22330625  0.04340639  0.50600645
13 -0.82293233  0.22586452 -0.82058059
14 -0.38483674 -0.38651706 -1.33218404
15 -0.33143327 -0.12833993 -0.33432244
16  0.40020483 -0.58673910 -0.51292024
17 -2.66155329 -0.66032907 -0.98167877
18 -1.49012484  0.91082996 -0.68865703
19 -2.17102582  1.49218359 -0.03119144
20 -0.28752746 -0.27363896 -0.59666780

Finding the column that has the largest sum in df2 −

> str(sort(colSums(df2[,1:length(df2)]),decreasing=TRUE)[1])

Output

Named num -2.31
- attr(*, "names")= chr "y2"

Example3

Live Demo

> z1<-rnorm(20,5,1.2)
> z2<-rnorm(20,5,1.2)
> z3<-rnorm(20,5,1.2)
> df3<-data.frame(z1,z2,z3)
> df3

Output

         z1       z2       z3
1  4.195753 5.237520 4.718239
2  5.406601 5.467189 5.656534
3  4.107268 4.206512 5.002071
4  4.273912 3.318249 3.851186
5  5.658334 4.044090 5.726887
6  5.794366 6.746781 5.573617
7  5.858288 6.643365 3.670364
8  5.996933 3.587626 3.603394
9  4.828025 5.512565 7.352176
10 5.232532 6.235726 2.827798
11 1.632488 6.318988 5.206436
12 4.033981 7.281025 5.996814
13 4.611700 6.482257 2.515667
14 5.551795 4.824941 4.938571
15 7.026488 5.153775 3.043448
16 4.917164 6.888027 6.673310
17 5.164733 5.986679 4.329136
18 5.114344 2.379626 6.442586
19 5.254078 5.369151 4.240947
20 7.874268 5.076189 7.012805

Finding the column that has the largest sum in df3 −

> str(sort(colSums(df3[,1:length(df3)]),decreasing=TRUE)[1])

Output

Named num 107
- attr(*, "names")= chr "z2"

Advertisements