How to select rows of an R data frame that are non-NA?


To select rows of an R data frame that are non-Na, we can use complete.cases function with single square brackets. For example, if we have a data frame called that contains some missing values (NA) then the selection of rows that are non-NA can be done by using the command df[complete.cases(df),].

Example1

Consider the below data frame −

Live Demo

> x1<-sample(c(1,NA),20,replace=TRUE)
> x2<-sample(c(5,NA),20,replace=TRUE)
> x3<-sample(c(3,NA),20,replace=TRUE)
> df1<-data.frame(x1,x2,x3)
> df1

Output

   x1 x2 x3 1   1 NA NA 2  NA  5  3 3   1  5 NA 4   1 NA NA 5  NA  5 NA 6  NA  5  3 7  NA  5 NA 8   1 NA  3 9  NA  5 NA 10 NA  5 NA 11 NA NA NA 12  1  5  3 13 NA  5  3 14 NA NA NA 15  1 NA NA 16 NA  5  3 17 NA NA  3 18 NA NA NA 19  1 NA  3 20 NA NA  3

Selecting rows of df1 that do not contain any NA −

> df1[complete.cases(df1),]

Output

   x1 x2 x3 12  1  5  3

Example2

Live Demo

> y1<-sample(c(rnorm(2),NA),20,replace=TRUE)
> y2<-sample(c(rnorm(2),NA),20,replace=TRUE)
> df2<-data.frame(y1,y2)
> df2

Output

           y1        y2
1  0.15079115 -0.626630
2  0.15079115        NA
3          NA -0.626630
4  0.15079115 -0.626630
5  0.15079115        NA
6  0.15079115 -0.626630
7  0.15079115        NA
8  0.15079115 -1.691553
9          NA -1.691553
10         NA -0.626630
11 0.15079115 -1.691553
12 0.15079115        NA
13         NA -1.691553
14         NA -1.691553
15 0.15079115 -1.691553
16         NA -0.626630
17 0.01495388 -0.626630
18 0.01495388 -1.691553
19 0.15079115 -1.691553
20         NA        NA

Selecting rows of df2 that do not contain any NA −

> df2[complete.cases(df2),]

Output

           y1        y2
1  0.15079115 -0.626630
4  0.15079115 -0.626630
6  0.15079115 -0.626630
8  0.15079115 -1.691553
11 0.15079115 -1.691553
15 0.15079115 -1.691553
17 0.01495388 -0.626630
18 0.01495388 -1.691553
19 0.15079115 -1.691553

Example3

Live Demo

> z1<-sample(c("A",NA),20,replace=TRUE)
> z2<-sample(c("B",NA),20,replace=TRUE)
> z3<-sample(c("C",NA),20,replace=TRUE)
> df3<-data.frame(z1,z2,z3)
> df3

Output

     z1   z2   z3
1     A <NA>    C
2  <NA>    B    C
3  <NA> <NA> <NA>
4     A    B <NA>
5  <NA> <NA>    C
6     A <NA>    C
7     A    B    C
8  <NA>    B    C
9  <NA> <NA>    C
10 <NA> <NA>    C
11    A <NA>    C
12 <NA> <NA>    C
13    A    B    C
14    A    B    C
15 <NA> <NA> <NA>
16    A    B    C
17 <NA> <NA> <NA>
18 <NA> <NA>    C
19    A    B    C
20 <NA> <NA> <NA>

Selecting rows of df3 that do not contain any NA −

> df3[complete.cases(df3),]

Output

   z1 z2 z3
7   A  B  C
13  A  B  C
14  A  B  C
16  A  B  C
19  A  B  C

Updated on: 06-Mar-2021

7K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements