How to find rows in an R data frame that do not have missing values?

R Programming Server Side Programming Programming

Dealing with missing values is one of the most critical task in data analysis. If we have a large amount of data then it is better to remove the rows that contains missing values. For the removal of such rows we can use complete.cases function.

For example, if we have a data frame called df that contains some missing values then we can remove the rows with missing values using the below given command −

df[complete.cases(df),]

Example 1

Following snippet creates a sample data frame −

x1<-sample(c(NA,rpois(2,5)),20,replace=TRUE)
x2<-sample(c(NA,rpois(2,5)),20,replace=TRUE)
x3<-sample(c(NA,rpois(2,5)),20,replace=TRUE)
df1<-data.frame(x1,x2,x3)
df1

The following dataframe is created −

   x1  x2 x3
1  NA  7  3
2   4 NA  3
3   4  7 NA
4   2  4 NA
5   2 NA  4
6   2  7 NA
7  NA  4  4
8  NA NA  4
9   2 NA NA
10 NA NA  4
11  4  7  3
12  4 NA  4
13 NA  7  3
14 NA  7  4
15 NA  7 NA
16  2 NA  4
17  2  4  3
18  4  7  3
19  2 NA  3
20  4  4 NA

To remove the rows of df1 with missing values, add the following code to the above snippet −

x1<-sample(c(NA,rpois(2,5)),20,replace=TRUE)
x2<-sample(c(NA,rpois(2,5)),20,replace=TRUE)
x3<-sample(c(NA,rpois(2,5)),20,replace=TRUE)
df1<-data.frame(x1,x2,x3)
df1[complete.cases(df1),]

Output

If you execute all the above given snippets as a single program, it generates the following output −

Example 2

Following snippet creates a sample data frame −

y1<-sample(c(NA,rnorm(2)),20,replace=TRUE)
y2<-sample(c(NA,rnorm(2)),20,replace=TRUE)
y3<-sample(c(NA,rnorm(2)),20,replace=TRUE)
df2<-data.frame(y1,y2,y3)
df2

The following dataframe is created −

       y1         y2           y3
1  -0.2619255  -0.80309246  -0.76031065
2  -0.2619255  -0.04079919  -0.76031065
3   1.7217166   NA          -0.76031065
4  -0.2619255   NA           NA
5   NA         -0.04079919  -0.76031065
6   1.7217166   NA           0.01337776
7   NA         -0.80309246   NA
8   NA          NA          -0.76031065
9   1.7217166  -0.04079919   NA
10  NA         -0.04079919   0.01337776
11  1.7217166  -0.80309246   0.01337776
12 -0.2619255   NA          -0.76031065
13  NA         -0.04079919   0.01337776
14 -0.2619255   NA           0.01337776
15 -0.2619255  -0.04079919   NA
16  NA         -0.04079919   NA
17 -0.2619255   NA          -0.76031065
18  1.7217166  -0.80309246   0.01337776
19  NA         -0.80309246  -0.76031065
20  NA         -0.04079919   NA

To remove the rows of df2 with missing values, add the following code to the above snippet −

y1<-sample(c(NA,rnorm(2)),20,replace=TRUE)
y2<-sample(c(NA,rnorm(2)),20,replace=TRUE)
y3<-sample(c(NA,rnorm(2)),20,replace=TRUE)
df2<-data.frame(y1,y2,y3)
df2[complete.cases(df2),]

Output

If you execute all the above given snippets as a single program, it generates the following output −

       y1        y2          y3
1  -0.2619255 -0.80309246 -0.76031065
2  -0.2619255 -0.04079919 -0.76031065
11  1.7217166 -0.80309246  0.01337776
18  1.7217166 -0.80309246  0.01337776

Nizamuddin Siddiqui

Updated on: 12-Nov-2021

407 Views

Kickstart Your Career

Get certified by completing the course

Get Started