Find the number of non-missing values in each group of an R data frame.


To find the number of non-missing values in each group of an R data frame, we can convert the data frame to data.table object and then use the sum function with negation of is.na.

For Example, if we have a data frame called df that contains a grouping column say Group and a numerical column with few NAs say Num then we can find the number of non-missing values in each Group by using the below given command −

setDT(df)[,sum(!is.na(df)),by=.(Group)]

Example 1

Following snippet creates a sample data frame −

Grp<-sample(LETTERS[1:3],20,replace=TRUE)
Dep_Var<-sample(c(NA,round(rnorm(2),2),20,replace=TRUE))
df1<-data.frame(Grp,Dep_Var)
df1

The following dataframe is created

  Grp Dep_Var
 1 B    NA
 2 A  1.00
 3 A  20.00
 4 B -0.63
 5 B -1.48
 6 B    NA
 7 A  1.00
 8 C  20.00
 9 A -0.63
10 A -1.48
11 C    NA
12 C  1.00
13 B  20.00
14 C -0.63
15 B -1.48
16 A    NA
17 C  1.00
18 B  20.00
19 A -0.63
20 B -1.48

To load data.table object and find the number of non-missing values in each Grp on the above created data frame, add the following code to the above snippet −

Grp<-sample(LETTERS[1:3],20,replace=TRUE)
Dep_Var<-sample(c(NA,round(rnorm(2),2),20,replace=TRUE))
df1<-data.frame(Grp,Dep_Var)
library(data.table)
setDT(df1)[,sum(!is.na(Dep_Var)),by=.(Grp)]

Output

If you execute all the above given snippets as a single program, it generates the following Output −

  Grp V1
1: B 6
2: A 6
3: C 4

Example 2

Following snippet creates a sample data frame −

Category<-sample(c("Low","Medium","High"),20,replace=TRUE)
Val<-sample(c(NA,rpois(2,5),20,replace=TRUE))
df2<-data.frame(Category,Val)
df2

The following dataframe is created

  Category Val
 1 Medium  20
 2 High     1
 3 High     8
 4 High     5
 5 High    NA
 6 Medium  20
 7 High     1
 8 Low      8
 9 Low      5
10 Medium  NA
11 Medium  20
12 Medium   1
13 Medium   8
14 Medium   5
15 Medium  NA
16 High    20
17 Medium   1
18 Medium   8
19 Low      5
20 Low     NA

To find the number of non-missing values in each Category on the above created data frame, add the following code to the above snippet −

Category<-sample(c("Low","Medium","High"),20,replace=TRUE)
Val<-sample(c(NA,rpois(2,5),20,replace=TRUE))
df2<-data.frame(Category,Val)
setDT(df2)[,sum(!is.na(Val)),by=.(Category)]

Output

If you execute all the above given snippets as a single program, it generates the following Output −

 Category V1
1: Medium 8
2: High   5
3: Low    3

Updated on: 08-Nov-2021

149 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements