How to subset an R data frame by specifying columns that contains NA?


To subset an R data frame by specifying columns that contains NA, we can follow the below steps −

  • First of all, create a data frame with some columns containing NAs.

  • Then, use is.na along with subset function to subset the data frame by specifying columns that contains NA.

Example

Create the data frame

Let’s create a data frame as shown below −

x<-sample(c(NA,1,2,3),25,replace=TRUE)
y<-sample(c(NA,10,12,20),25,replace=TRUE)
z<-sample(c(NA,100,120,180),25,replace=TRUE)
a<-sample(c(NA,80,77,62),25,replace=TRUE)
df<-data.frame(x,y,z,a)
df

Output

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

   x  y    z   a
1  1  NA  100  62
2  2  NA   NA  62
3  3  20  120  77
4  NA 20  180  62
5  1  NA   NA  NA
6  2  20   NA  62
7  3  10  180  62
8  2  10  120  62
9  1  12   NA  77
10 3  NA  100  62
11 1  10   NA  77
12 NA 12  180  77
13 NA NA  180  77
14 2  NA  180  62
15 3  10   NA  80
16 3  NA  100  80
17 1  20  120  80
18 1  10  120  80
19 1  12  100  NA
20 1  12  100  NA
21 1  10  180  77
22 1  12   NA  80
23 1  NA   NA  80
24 NA 10  100  NA
25 3  20   NA  62

Subset data frame by specifying columns having NAs

Using is.na along with subset function to subset the data frame df by specifying columns x and z that contains NA as shown below −

x<-sample(c(NA,1,2,3),25,replace=TRUE)
y<-sample(c(NA,10,12,20),25,replace=TRUE)
z<-sample(c(NA,100,120,180),25,replace=TRUE)
a<-sample(c(NA,80,77,62),25,replace=TRUE)
df<-data.frame(x,y,z,a)
subset(df,is.na(x)|is.na(z))

Output

   x  y  z   a
2  2  NA  NA  62
4  NA 20 180  62
5  1  NA  NA  NA
6  2  20  NA  62
9  1  12  NA  77
11 1  10  NA  77
12 NA 12 180  77
13 NA NA 180  77
15 3  10  NA  80
22 1  12  NA  80
23 1  NA  NA  80
24 NA 10 100  NA
25 3  20  NA  62

Updated on: 08-Nov-2021

263 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements