How to remove rows that contains coded missing value for all columns in an R data frame?


Sometimes missing values are coded and when we perform analysis without replacing those missing values the result of the analysis becomes a little difficult to interpret, especially it is difficult to understand by first time readers.

Therefore, we might want to remove rows that contains coded missing values. For this purpose, we can replace the coded missing values with NA and then replace the rows with NA as shown in the below given examples.

Example 1

Following snippet creates a data frame, if missing values are coded as 1 −

x1<-rpois(20,1)
x2<-rpois(20,1)
df1<-data.frame(x1,x2)
df1

The following dataframe is created −

   x1 x2
1  1  0
2  1  2
3  1  3
4  1  1
5  0  1
6  0  1
7  1  0
8  0  1
9  2  1
10 1  2
11 0  3
12 1  0
13 1  2
14 2  2
15 0  0
16 2  3
17 1  1
18 2  0
19 0  0
20 1  1

To remove rows that contains coded missing value for all columns in an R data frame, add the following code to the above snippet −

x1<-rpois(20,1)
x2<-rpois(20,1)
df1<-data.frame(x1,x2)
df1[df1==1]<-NA
df1

Output

If you execute all the above given snippets as a single program, it generates the following output: −

   x1  x2
1  NA   0
2  NA   2
3  NA   3
4  NA  NA
5   0  NA
6   0  NA
7  NA   0
8   0  NA
9   2  NA
10 NA   2
11  0   3
12 NA   0
13 NA   2
14  2   2
15  0   0
16  2   3
17 NA  NA
18  2   0
19  0   0
20 NA  NA

To remove rows that contains coded missing value for all columns in an R data frame, add the following code to the above snippet −

df1[rowSums(is.na(df1))<ncol(df1),]

Output

If you execute all the above given snippets as a single program, it generates the following output: −

   x1   x2
1  NA   0
2  NA   2
3  NA   3
5   0  NA
6   0  NA
7  NA   0
8   0  NA
9   2  NA
10 NA   2
11  0   3
12 NA   0
13 NA   2
14  2   2
15  0   0
16  2   3
18  2   0
19  0   0

Example 2

Following snippet creates a data frame, if missing values are coded as 99 −

y1<-sample(c(1,99),20,replace=TRUE)
y2<-sample(c(5,99),20,replace=TRUE)
df2<-data.frame(y1,y2)
df2

The following dataframe is created −

   y1  y2
1  99   5
2  99   5
3  99   5
4   1  99
5   1  99
6   1   5
7   1  99
8  99  99
9  99  99
10 99  99
11 99  99
12 99   5
13  1  99
14 99   5
15 99   5
16 99  99
17 99   5
18 99  99
19 99  99
20 99   5

To remove rows that contains coded missing value for all columns in an R data frame, add the following code to the above snippet −

y1<-sample(c(1,99),20,replace=TRUE)
y2<-sample(c(5,99),20,replace=TRUE)
df2<-data.frame(y1,y2)
df2[df2==99]<-NA
df2

Output

If you execute all the above given snippets as a single program, it generates the following output: −

   y1   y2
1  NA   5
2  NA   5
3  NA   5
4   1  NA
5   1  NA
6   1   5
7   1  NA
8  NA  NA
9  NA  NA
10 NA  NA
11 NA  NA
12 NA   5
13  1  NA
14 NA   5
15 NA   5
16 NA  NA
17 NA   5
18 NA  NA
19 NA  NA
20 NA   5

To remove rows that contains coded missing value for all columns in an R data frame, add the following code to the above snippet −

df2[rowSums(is.na(df2))<ncol(df2),]

Output

If you execute all the above given snippets as a single program, it generates the following output: −

    y1  y2
1  NA   5
2  NA   5
3  NA   5
4   1  NA
5   1  NA
6   1   5
7   1  NA
12 NA   5
13  1  NA
14 NA   5
15 NA   5
17 NA   5
20 NA   5

Updated on: 11-Nov-2021

236 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements