How to find the sum of values based on two groups if missing values are present in R?


To find the sum of values based on two groups if missing values are present, we can use group_by and summarise function of dplyr package.

For example, if we have a data frame called df that contains a numerical column say Num and two grouping columns say Grp1 and Grp2 then, the sum of values in Num based on Grp1 and Grp2 if missing values are present in df can be found by using the below mentioned command −

df%>%group_by(Grp1,Grp2)%>%summarise(Sum=sum(Num,na.rm=TRUE))

Example 1

Following snippet creates a sample data frame −

grp1<-sample(LETTERS[1:3],20,replace=TRUE)
grp2<-sample(letters[1:3],20,replace=TRUE)
Dep_Var<-sample(c(NA,rpois(2,5)),20,replace=TRUE)
df1<-data.frame(grp1,grp2,Dep_Var)
df1

Output

The following dataframe is created −

   grp1 grp2 Dep_Var
1   A    a    4
2   A    c   NA
3   C    a    5
4   A    c   NA
5   A    b   NA
6   A    a    4
7   C    c    5
8   B    b    4
9   C    a   NA
10  A    c    5
11  B    a   NA
12  B    c    5
13  A    c   NA
14  C    c    4
15  C    c    5
16  A    a   NA
17  C    b    4
18  A    a    4
19  B    a    4
20  B    a   NA

To load dplyr package and find the sum of values in Dep_Var based on grp1 and grp2, add the following code to the above snippet −

library(dplyr)
df1%>%group_by(grp1,grp2)%>%summarise(Sum=sum(Dep_Var,na.rm=TRUE))
`summarise()` regrouping Output by 'grp1' (override with `.groups` argument)
# A tibble: 9 x 3
# Groups: grp1 [3]

Output

If you execute all the above given snippets as a single program, it generates the following Output −

grp1 grp2 Sum
 <chr> <chr> <int>
1  A   a    12
2  A   b     0
3  A   c     5
4  B   a     4
5  B   b     4
6  B   c     5
7  C   a     5
8  C   b     4
9  C   c    14

Example 2

Following snippet creates a sample data frame −

Class<-sample(c("First","Second","Third"),20,replace=TRUE)
Rank<-sample(1:4,20,replace=TRUE)
Score<-sample(1:100,20)
df2<-data.frame(Class,Rank,Score)
df2

Output

The following dataframe is created −

  Class   Rank Score
1  Second  4   100
2  First   4    76
3  First   2    54
4  Third   3    66
5  Third   2    98
6  Second  1    81
7  First   3    95
8  Second  1     7
9  Third   3    94
10 Second  1    78
11 Third   3    13
12 Third   2    19
13 Second  4    30
14 First   4    31
15 Second  3    87
16 First   3    38
17 First   1    60
18 First   3    36
19 Second  4     3
20 Third   3    84

To find the sum of values in Score based on Class and Rank, add the following code to the above snippet −

df2%>%group_by(Class,Rank)%>%summarise(Sum=sum(Score,na.rm=TRUE))
`summarise()` regrouping

Output

by 'Class' (override with `.groups` argument) # A tibble: 9 x 3 # Groups: Class [3]

Output

If you execute all the above given snippets as a single program, it generates the following Output −

  Class  Rank   Sum
  <chr>  <int> <int>
1  First   1   60
2  First   2   54
3  First   3  169
4  First   4  107
5  Second  1  166
6  Second  3   87
7  Second  4  133
8  Third   2  117
9  Third   3  257

Updated on: 02-Nov-2021

225 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements