Find the number of non-missing values in each column by group in an R data frame.

R ProgrammingServer Side ProgrammingProgramming

To find the number of non-missing values in each column by group in an R data frame, we can use summarise_each function of dplyr package with negation of is.na function.

For Example, if we have a data frame called df that contains a grouping column say G and some other columns having few NAs then we can find the number of non-missing values in each column by grouping column G with the help of below command −

df%%group_by(G)%%summarise_each(funs(sum(!is.na(.))))

Example 1

Following snippet creates a sample data frame −

Group<-sample(LETTERS[1:4],20,replace=TRUE)
x1<-sample(c(NA,1,2),20,replace=TRUE)
x2<-sample(c(NA,round(rnorm(3),2)),20,replace=TRUE)
df1<-data.frame(Group,x1,x2)
df1

The following dataframe is created

Group x1   x2
1  C   2 -0.49
2  B   1  1.86
3  A  NA    NA
4  A  NA    NA
5  D   1  1.12
6  A  NA  1.12
7  B   1  1.86
8  C   1 -0.49
9  D   2    NA
10 A  NA  1.12
11 A  NA  1.12
12 C  NA  1.86
13 A   2 -0.49
14 A   1 -0.49
15 C  NA    NA
16 C  NA  1.86
17 A  NA  1.12
18 D   2 -0.49
19 C   2  1.86
20 C   2    NA

To load dplyr package and find the number of non-missing values in each column by Group column in df1 on the above created data frame, add the following code to the above snippet −

Group<-sample(LETTERS[1:4],20,replace=TRUE)
x1<-sample(c(NA,1,2),20,replace=TRUE)
x2<-sample(c(NA,round(rnorm(3),2)),20,replace=TRUE)
df1<-data.frame(Group,x1,x2)
library(dplyr)
df1%%group_by(Group)%%summarise_each(funs(sum(!is.na(.))))
# A tibble: 4 x 3

Output

If you execute all the above given snippets as a single program, it generates the following Output −

Group x1 x2
 <chr <int <int
1  A   2   6
2  B   2   2
3  C   4   5
4  D   3   2

Example 2

Following snippet creates a sample data frame −

Class<-sample(c("I","II","III"),20,replace=TRUE)
Score<-sample(c(NA,1:5),20,replace=TRUE)
Rank<-sample(c(NA,0:2),20,replace=TRUE)
df2<-data.frame(Class,Score,Rank)
df2

The following dataframe is created

  Class Score Rank
1  II    NA   NA
2  III    5    2
3  II     5    2
4  II     4   NA
5  II    NA   NA
6  II     4    2
7  II     2    1
8  III    4    2
9  II     2   NA
10 III   NA    0
11 I      1    2
12 II    NA   NA
13 I      2    1
14 I      5   NA
15 I      3    0
16 I      4    0
17 I      3    2
18 II     2   NA
19 II     2   NA
20 II     2    1

To find the number of non-missing values in each column by Class column in df2 on the above created data frame, add the following code to the above snippet −

Class<-sample(c("I","II","III"),20,replace=TRUE)
Score<-sample(c(NA,1:5),20,replace=TRUE)
Rank<-sample(c(NA,0:2),20,replace=TRUE)
df2<-data.frame(Class,Score,Rank)
df2%%group_by(Class)%%summarise_each(funs(sum(!is.na(.))))
# A tibble: 3 x 3

Output

If you execute all the above given snippets as a single program, it generates the following Output −

 Class Score Rank
<chr <int  <int
1 I    6     5
2 II   8     4
3 III  2     3
raja
Published on 03-Nov-2021 08:24:08
Advertisements