How to remove rows that contains NA values in certain columns of an R data frame?


If we have missing data in our data frame then some of them can be replaced if we have enough information about the characteristic of the case for which the information is missing. But if that information is not available and we do not find any suitable way to replace the missing values then complete.cases function can be used with the columns that has missing values.

Example

Consider the below data frame:

Live Demo

> set.seed(19991)
> x1<-sample(c(NA,rnorm(5,2,1)),20,replace=TRUE)
> x2<-sample(c(NA,rnorm(5,40,0.87)),20,replace=TRUE)
> x3<-sample(c(NA,rnorm(5,1,0.015)),20,replace=TRUE)
> x4<-sample(c(NA,rnorm(10,5,1.27)),20,replace=TRUE)
> x5<-sample(c(NA,rnorm(8,1,0.20)),20,replace=TRUE)
> df1<-data.frame(x1,x2,x3,x4,x5)
> df1

Output

     x1        x2       x3        x4        x5
1 0.8287962 39.74094 0.9983586 6.338327 0.8692225
2 1.3167347 NA NA 4.133738 0.8692225
3 3.9911408 38.84212 1.0047761 5.825111 0.8423061
4 0.6426335 39.74094 1.0047761 5.177329 NA
5 1.3167347 NA 0.9963252 5.073915 0.8423061
6 0.8287962 38.84212 0.9963252 5.154073 1.0566156
7 NA 40.36844 0.9927987 NA 0.8423061
8 0.1952913 40.36844 1.0047761 6.338327 NA
9 3.9911408 NA 1.0366262 5.154073 1.1936387
10 0.6426335 39.77818 0.9927987 5.177329 0.8557775
11 NA NA 1.0047761 7.216787 0.9506370
12 NA 38.84212 0.9983586 NA 0.8423061
13 1.3167347 39.77818 0.9963252 5.825111 0.8557775
14 0.8287962 39.77818 1.0366262 5.177329 NA
15 0.1952913 NA 0.9927987 5.073915 0.8692225
16 0.1952913 38.84212 1.0366262 5.154073 0.8286973
17 0.1952913 38.84212 1.0366262 NA 0.9506370
18 1.3167347 40.36844 0.9983586 NA 1.0566156
19 0.1952913 39.80231 NA 5.073915 NA
20 NA NA 0.9983586 5.073915 0.8557775

Removing rows of df1 in where columns 3 to 5 contains NA:

Example

> df1[complete.cases(df1[3:5]),]

Output

       x1       x2      x3        x4      x5
1 0.8287962 39.74094 0.9983586 6.338327 0.8692225
3 3.9911408 38.84212 1.0047761 5.825111 0.8423061
5 1.3167347 NA 0.9963252 5.073915 0.8423061
6 0.8287962 38.84212 0.9963252 5.154073 1.0566156
9 3.9911408 NA 1.0366262 5.154073 1.1936387
10 0.6426335 39.77818 0.9927987 5.177329 0.8557775
11 NA NA 1.0047761 7.216787 0.9506370
13 1.3167347 39.77818 0.9963252 5.825111 0.8557775
15 0.1952913 NA 0.9927987 5.073915 0.8692225
16 0.1952913 38.84212 1.0366262 5.154073 0.8286973
20 NA NA 0.9983586 5.073915 0.8557775

Removing rows of df1 in where columns 1 to 3 contains NA:

Example

> df1[complete.cases(df1[1:3]),]

Output

      x1        x2        x3      x4        x5
1 0.8287962 39.74094 0.9983586 6.338327 0.8692225
3 3.9911408 38.84212 1.0047761 5.825111 0.8423061
4 0.6426335 39.74094 1.0047761 5.177329 NA
6 0.8287962 38.84212 0.9963252 5.154073 1.0566156
8 0.1952913 40.36844 1.0047761 6.338327 NA
10 0.6426335 39.77818 0.9927987 5.177329 0.8557775
13 1.3167347 39.77818 0.9963252 5.825111 0.8557775
14 0.8287962 39.77818 1.0366262 5.177329 NA
16 0.1952913 38.84212 1.0366262 5.154073 0.8286973
17 0.1952913 38.84212 1.0366262 NA 0.9506370
18 1.3167347 40.36844 0.9983586 NA 1.0566156

Removing rows of df1 in where columns 2 to 4 contains NA:

Example

> df1[complete.cases(df1[2:4]),]

Output

       x1        x2      x3         x4      x5
1 0.8287962 39.74094 0.9983586 6.338327 0.8692225
3 3.9911408 38.84212 1.0047761 5.825111 0.8423061
4 0.6426335 39.74094 1.0047761 5.177329 NA
6 0.8287962 38.84212 0.9963252 5.154073 1.0566156
8 0.1952913 40.36844 1.0047761 6.338327 NA
10 0.6426335 39.77818 0.9927987 5.177329 0.8557775
13 1.3167347 39.77818 0.9963252 5.825111 0.8557775
14 0.8287962 39.77818 1.0366262 5.177329 NA
16 0.1952913 38.84212 1.0366262 5.154073 0.8286973

Let’s have a look at another example:

Example

Live Demo

> y1<-sample(c(NA,rpois(5,2)),20,replace=TRUE)
> y2<-sample(c(NA,rpois(5,5)),20,replace=TRUE)
> y3<-sample(c(NA,rpois(5,1)),20,replace=TRUE)
> y4<-sample(c(NA,rpois(5,2)),20,replace=TRUE)
> df2<-data.frame(y1,y2,y3,y4)
> df2

Output

y1 y2 y3 y4
1 0 2 0 NA
2 6 NA NA NA
3 0 9 1 1
4 6 4 NA 1
5 2 2 0 2
6 2 9 NA NA
7 6 2 0 1
8 2 4 1 NA
9 2 2 1 1
10 6 4 1 2
11 2 2 0 NA
12 6 2 3 1
13 0 4 1 1
14 2 4 1 0
15 2 9 0 1
16 2 2 1 1
17 2 9 NA 1
18 2 9 0 1
19 2 9 1 0
20 NA 2 3 1

Example

> df2[complete.cases(df2[1:3]),]

Output

y1 y2 y3 y4
1 0 2 0 NA
3 0 9 1 1
5 2 2 0 2
7 6 2 0 1
8 2 4 1 NA
9 2 2 1 1
10 6 4 1 2
11 2 2 0 NA
12 6 2 3 1
13 0 4 1 1
14 2 4 1 0
15 2 9 0 1
16 2 2 1 1
18 2 9 0 1
19 2 9 1 0

Example

> df2[complete.cases(df2[2:4]),]

Output

y1 y2 y3 y4
3 0 9 1 1
5 2 2 0 2
7 6 2 0 1
9 2 2 1 1
10 6 4 1 2
12 6 2 3 1
13 0 4 1 1
14 2 4 1 0
15 2 9 0 1
16 2 2 1 1
18 2 9 0 1
19 2 9 1 0
20 NA 2 3 1

Example

> df2[complete.cases(df2[c(1,3)]),]

Output

y1 y2 y3 y4
1 0 2 0 NA
3 0 9 1 1
5 2 2 0 2
7 6 2 0 1
8 2 4 1 NA
9 2 2 1 1
10 6 4 1 2
11 2 2 0 NA
12 6 2 3 1
13 0 4 1 1
14 2 4 1 0
15 2 9 0 1
16 2 2 1 1
18 2 9 0 1
19 2 9 1 0

Updated on: 21-Nov-2020

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements