How to subset R data frame rows and keep the rows with NA in the output?


To subset R data frame rows and keep the rows with NA in the output, we can use subset function along with OR condition with | sign for na values.

For example, if we have a data frame called df that contains a column say C which has some NA values then we can subset df for values greater than 5 and include NA in the output by using the below given command −

subset(df,C>5|is.na(C))

Example 1

Following snippet creates a sample data frame −

x1<-sample(c(NA,1,2,3),20,replace=TRUE)
y1<-sample(c(NA,1,2,3),20,replace=TRUE)
df1<-data.frame(x1,y1)
df1

The following dataframe is created −

    x1 y1
1   3  3
2   2  2
3   1  1
4   2  2
5   1 NA
6   2 NA
7   1  3
8   2  3
9   2  2
10  1  3
11 NA  2
12  3  1
13  3  3
14  3  1
15  1  2
16  1  1
17  2  2
18  2  1
19  1  3
20  1  2

In order to subset df1 if x1 values less than 2 Without NA, add the following code to the above snippet −

x1<-sample(c(NA,1,2,3),20,replace=TRUE)
y1<-sample(c(NA,1,2,3),20,replace=TRUE)
df1<-data.frame(x1,y1)
subset(df1,x1<2)

Output

If you execute all the above given snippets as a single program, it generates the following output −

   x1 y1
3  1  1
5  1 NA
7  1  3
10 1  3
15 1  2
16 1  1
19 1  3
20 1  2

Moreover, to subset df1 if x1 values less than 2 With NA, add the following code to the above snippet −

x1<-sample(c(NA,1,2,3),20,replace=TRUE)
y1<-sample(c(NA,1,2,3),20,replace=TRUE)
df1<-data.frame(x1,y1)
subset(df1,x1<2|is.na(x1))

Output

If you execute all the above given snippets as a single program, it generates the following output −

    x1 y1
3   1  1
5   1 NA
7   1  3
10  1  3
11 NA  2
15  1  2
16  1  1
19  1  3
20  1  2

Example 2

Following snippet creates a sample data frame −

x2<-sample(c(NA,rpois(2,3)),20,replace=TRUE)
y2<-sample(c(NA,rpois(2,5)),20,replace=TRUE)
df2<-data.frame(x2,y2)
df2

The following dataframe is created −

    x2 y2
1   5 NA
2   1  3
3   1  6
4   5  3
5   1  6
6   5  3
7   1  6
8   5  6
9   5  6
10  5 NA
11 NA  6
12  1  3
13  5 NA
14  1  3
15 NA  6
16  1  6
17  5  3
18 NA NA
19 NA  3
20  5  6

In order to subset df2 if y2 values less than 6 Without NA, add the following code to the above snippet −

x2<-sample(c(NA,rpois(2,3)),20,replace=TRUE)
y2<-sample(c(NA,rpois(2,5)),20,replace=TRUE)
df2<-data.frame(x2,y2)
subset(df2,y2<6)

Output

If you execute all the above given snippets as a single program, it generates the following output −

   x2  y2
2   1  3
4   5  3
6   5  3
12  1  3
14  1  3
17  5  3
19 NA  3

Moreover, to subset df2 if y2 values less than 6 With NA, add the following code to the above snippet −

x2<-sample(c(NA,rpois(2,3)),20,replace=TRUE)
y2<-sample(c(NA,rpois(2,5)),20,replace=TRUE)
df2<-data.frame(x2,y2)
subset(df2,y2<6|is.na(y2))

Output

If you execute all the above given snippets as a single program, it generates the following output −

    x2 y2
1   5 NA
2   1  3
4   5  3
6   5  3
10  5 NA
12  1  3
13  5 NA
14  1  3
17  5  3
18 NA NA
19 NA  3

Updated on: 12-Nov-2021

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements