Roll up R data frame columns for summation by group if missing values exist in the data frame.


The summation of column values if missing values exist in the R data frame can be found with the help of summarise_each function of dplyr package where we can remove missing values by setting na.rm argument to TRUE.

Since, we we will have groups in the data frame hence group_by function of the same package will help the summarise_each function to perform the summation by group. Check out the below Examples to understand how it works.

Example 1

Following snippet creates a sample data frame −

Grp<-sample(c("I","II","III"),20,replace=TRUE)
Rate<-sample(c(NA,2,4),20,replace=TRUE)
df1<-data.frame(Grp,Rate)
df1

The following dataframe is created

   Grp Rate
1   II   2
2    I   4
3   II  NA
4   II   4
5   II   2
6    I  NA
7  III   2
8  III   4
9    I  NA
10   I  4
11 III  4
12  II  4
13  II NA
14   I  4
15 III  4
16 III  4
17 III  2
18  II NA
19 III  2
20 III  2

To load dplyr package and rolling up columns in df1 to find the summation by ignoring missing values on the above created data frame, add the following code to the above snippet −

Grp<-sample(c("I","II","III"),20,replace=TRUE)
Rate<-sample(c(NA,2,4),20,replace=TRUE)
df1<-data.frame(Grp,Rate)
library(dplyr)
df1%%group_by(Grp)%%summarise_each(funs(sum(.,na.rm=TRUE)))
# A tibble: 3 x 2

Output

If you execute all the above given snippets as a single program, it generates the following Output −

 Grp Rate
 <chr <dbl
1   I 12
2  II 12
3 III 24

Example 2

Following snippet creates a sample data frame −

Class<-sample(c("First","Second","Third"),20,replace=TRUE)
Price<-sample(c(NA,1.25,2.75),20,replace=TRUE)
df2<-data.frame(Class,Price)
df2

The following dataframe is created

  Class Price
1 Third  1.25
2 First    NA
3 Third  1.25
4 Third  2.75
5 Second   NA
6 Second 1.25
7 Second 1.25
8 Third    NA
9 First  2.75
10 Second  NA
11 Second  NA
12 Second  NA
13 Third   NA
14 Third   NA
15 Third   NA
16 Third 2.75
17 First 2.75
18 Third   NA
19 Third 1.25
20 Third   NA

To roll up columns in df2 to find the summation by ignoring missing values on the above created data frame, add the following code to the above snippet −

Class<-sample(c("First","Second","Third"),20,replace=TRUE)
Price<-sample(c(NA,1.25,2.75),20,replace=TRUE)
df2<-data.frame(Class,Price)
df2%%group_by(Class)%%summarise_each(funs(sum(.,na.rm=TRUE)))
# A tibble: 3 x 2

Output

If you execute all the above given snippets as a single program, it generates the following Output −

 Class Price
  <chr <dbl
1 First   5.5
2 Second  2.5
3 Third  9.25

Example 3

Following snippet creates a sample data frame −

Category<-sample(c("Small","Medium","Large"),20,replace=TRUE)
Quantity<-sample(c(NA,500,1000),20,replace=TRUE)
df3<-data.frame(Category,Quantity)
df3

The following dataframe is created

 Category Quantity
1 Large   1000
2 Small   1000
3 Small    500
4 Large    500
5 Small   1000
6 Medium    NA
7 Small    500
8 Medium   500
9 Large     NA
10 Medium  500
11 Medium   NA
12 Large    NA
13 Small   500
14 Medium 1000
15 Large    NA
16 Medium  500
17 Small   500
18 Medium   NA
19 Small    NA
20 Medium 1000

To roll up columns in df3 to find the summation by ignoring missing values on the above created data frame, add the following code to the above snippet −

Category<-sample(c("Small","Medium","Large"),20,replace=TRUE)
Quantity<-sample(c(NA,500,1000),20,replace=TRUE)
df3<-data.frame(Category,Quantity)
df3%%group_by(Category)%%summarise_each(funs(sum(.,na.rm=TRUE)))
# A tibble: 3 x 2

Output

If you execute all the above given snippets as a single program, it generates the following Output −

 Category Quantity
 <chr <dbl
1 Large  1500
2 Medium 3500
3 Small  4000

Updated on: 02-Nov-2021

368 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements