How to subset an R data frame by specifying columns that contains NA?

R Programming Server Side Programming Programming

To subset an R data frame by specifying columns that contains NA, we can follow the below steps −

First of all, create a data frame with some columns containing NAs.
Then, use is.na along with subset function to subset the data frame by specifying columns that contains NA.

Example

Create the data frame

Let’s create a data frame as shown below −

x<-sample(c(NA,1,2,3),25,replace=TRUE)
y<-sample(c(NA,10,12,20),25,replace=TRUE)
z<-sample(c(NA,100,120,180),25,replace=TRUE)
a<-sample(c(NA,80,77,62),25,replace=TRUE)
df<-data.frame(x,y,z,a)
df

Output

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

   x  y    z   a
1  1  NA  100  62
2  2  NA   NA  62
3  3  20  120  77
4  NA 20  180  62
5  1  NA   NA  NA
6  2  20   NA  62
7  3  10  180  62
8  2  10  120  62
9  1  12   NA  77
10 3  NA  100  62
11 1  10   NA  77
12 NA 12  180  77
13 NA NA  180  77
14 2  NA  180  62
15 3  10   NA  80
16 3  NA  100  80
17 1  20  120  80
18 1  10  120  80
19 1  12  100  NA
20 1  12  100  NA
21 1  10  180  77
22 1  12   NA  80
23 1  NA   NA  80
24 NA 10  100  NA
25 3  20   NA  62

Subset data frame by specifying columns having NAs

Using is.na along with subset function to subset the data frame df by specifying columns x and z that contains NA as shown below −

x<-sample(c(NA,1,2,3),25,replace=TRUE)
y<-sample(c(NA,10,12,20),25,replace=TRUE)
z<-sample(c(NA,100,120,180),25,replace=TRUE)
a<-sample(c(NA,80,77,62),25,replace=TRUE)
df<-data.frame(x,y,z,a)
subset(df,is.na(x)|is.na(z))

Output

   x  y  z   a
2  2  NA  NA  62
4  NA 20 180  62
5  1  NA  NA  NA
6  2  20  NA  62
9  1  12  NA  77
11 1  10  NA  77
12 NA 12 180  77
13 NA NA 180  77
15 3  10  NA  80
22 1  12  NA  80
23 1  NA  NA  80
24 NA 10 100  NA
25 3  20  NA  62

Nizamuddin Siddiqui

Updated on: 08-Nov-2021

300 Views

Kickstart Your Career

Get certified by completing the course

Get Started