
- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to remove rows that contains NA values in certain columns of an R data frame?
If we have missing data in our data frame then some of them can be replaced if we have enough information about the characteristic of the case for which the information is missing. But if that information is not available and we do not find any suitable way to replace the missing values then complete.cases function can be used with the columns that has missing values.
Example
Consider the below data frame:
> set.seed(19991) > x1<-sample(c(NA,rnorm(5,2,1)),20,replace=TRUE) > x2<-sample(c(NA,rnorm(5,40,0.87)),20,replace=TRUE) > x3<-sample(c(NA,rnorm(5,1,0.015)),20,replace=TRUE) > x4<-sample(c(NA,rnorm(10,5,1.27)),20,replace=TRUE) > x5<-sample(c(NA,rnorm(8,1,0.20)),20,replace=TRUE) > df1<-data.frame(x1,x2,x3,x4,x5) > df1
Output
x1 x2 x3 x4 x5 1 0.8287962 39.74094 0.9983586 6.338327 0.8692225 2 1.3167347 NA NA 4.133738 0.8692225 3 3.9911408 38.84212 1.0047761 5.825111 0.8423061 4 0.6426335 39.74094 1.0047761 5.177329 NA 5 1.3167347 NA 0.9963252 5.073915 0.8423061 6 0.8287962 38.84212 0.9963252 5.154073 1.0566156 7 NA 40.36844 0.9927987 NA 0.8423061 8 0.1952913 40.36844 1.0047761 6.338327 NA 9 3.9911408 NA 1.0366262 5.154073 1.1936387 10 0.6426335 39.77818 0.9927987 5.177329 0.8557775 11 NA NA 1.0047761 7.216787 0.9506370 12 NA 38.84212 0.9983586 NA 0.8423061 13 1.3167347 39.77818 0.9963252 5.825111 0.8557775 14 0.8287962 39.77818 1.0366262 5.177329 NA 15 0.1952913 NA 0.9927987 5.073915 0.8692225 16 0.1952913 38.84212 1.0366262 5.154073 0.8286973 17 0.1952913 38.84212 1.0366262 NA 0.9506370 18 1.3167347 40.36844 0.9983586 NA 1.0566156 19 0.1952913 39.80231 NA 5.073915 NA 20 NA NA 0.9983586 5.073915 0.8557775
Removing rows of df1 in where columns 3 to 5 contains NA:
Example
> df1[complete.cases(df1[3:5]),]
Output
x1 x2 x3 x4 x5 1 0.8287962 39.74094 0.9983586 6.338327 0.8692225 3 3.9911408 38.84212 1.0047761 5.825111 0.8423061 5 1.3167347 NA 0.9963252 5.073915 0.8423061 6 0.8287962 38.84212 0.9963252 5.154073 1.0566156 9 3.9911408 NA 1.0366262 5.154073 1.1936387 10 0.6426335 39.77818 0.9927987 5.177329 0.8557775 11 NA NA 1.0047761 7.216787 0.9506370 13 1.3167347 39.77818 0.9963252 5.825111 0.8557775 15 0.1952913 NA 0.9927987 5.073915 0.8692225 16 0.1952913 38.84212 1.0366262 5.154073 0.8286973 20 NA NA 0.9983586 5.073915 0.8557775
Removing rows of df1 in where columns 1 to 3 contains NA:
Example
> df1[complete.cases(df1[1:3]),]
Output
x1 x2 x3 x4 x5 1 0.8287962 39.74094 0.9983586 6.338327 0.8692225 3 3.9911408 38.84212 1.0047761 5.825111 0.8423061 4 0.6426335 39.74094 1.0047761 5.177329 NA 6 0.8287962 38.84212 0.9963252 5.154073 1.0566156 8 0.1952913 40.36844 1.0047761 6.338327 NA 10 0.6426335 39.77818 0.9927987 5.177329 0.8557775 13 1.3167347 39.77818 0.9963252 5.825111 0.8557775 14 0.8287962 39.77818 1.0366262 5.177329 NA 16 0.1952913 38.84212 1.0366262 5.154073 0.8286973 17 0.1952913 38.84212 1.0366262 NA 0.9506370 18 1.3167347 40.36844 0.9983586 NA 1.0566156
Removing rows of df1 in where columns 2 to 4 contains NA:
Example
> df1[complete.cases(df1[2:4]),]
Output
x1 x2 x3 x4 x5 1 0.8287962 39.74094 0.9983586 6.338327 0.8692225 3 3.9911408 38.84212 1.0047761 5.825111 0.8423061 4 0.6426335 39.74094 1.0047761 5.177329 NA 6 0.8287962 38.84212 0.9963252 5.154073 1.0566156 8 0.1952913 40.36844 1.0047761 6.338327 NA 10 0.6426335 39.77818 0.9927987 5.177329 0.8557775 13 1.3167347 39.77818 0.9963252 5.825111 0.8557775 14 0.8287962 39.77818 1.0366262 5.177329 NA 16 0.1952913 38.84212 1.0366262 5.154073 0.8286973
Let’s have a look at another example:
Example
> y1<-sample(c(NA,rpois(5,2)),20,replace=TRUE) > y2<-sample(c(NA,rpois(5,5)),20,replace=TRUE) > y3<-sample(c(NA,rpois(5,1)),20,replace=TRUE) > y4<-sample(c(NA,rpois(5,2)),20,replace=TRUE) > df2<-data.frame(y1,y2,y3,y4) > df2
Output
y1 y2 y3 y4 1 0 2 0 NA 2 6 NA NA NA 3 0 9 1 1 4 6 4 NA 1 5 2 2 0 2 6 2 9 NA NA 7 6 2 0 1 8 2 4 1 NA 9 2 2 1 1 10 6 4 1 2 11 2 2 0 NA 12 6 2 3 1 13 0 4 1 1 14 2 4 1 0 15 2 9 0 1 16 2 2 1 1 17 2 9 NA 1 18 2 9 0 1 19 2 9 1 0 20 NA 2 3 1
Example
> df2[complete.cases(df2[1:3]),]
Output
y1 y2 y3 y4 1 0 2 0 NA 3 0 9 1 1 5 2 2 0 2 7 6 2 0 1 8 2 4 1 NA 9 2 2 1 1 10 6 4 1 2 11 2 2 0 NA 12 6 2 3 1 13 0 4 1 1 14 2 4 1 0 15 2 9 0 1 16 2 2 1 1 18 2 9 0 1 19 2 9 1 0
Example
> df2[complete.cases(df2[2:4]),]
Output
y1 y2 y3 y4 3 0 9 1 1 5 2 2 0 2 7 6 2 0 1 9 2 2 1 1 10 6 4 1 2 12 6 2 3 1 13 0 4 1 1 14 2 4 1 0 15 2 9 0 1 16 2 2 1 1 18 2 9 0 1 19 2 9 1 0 20 NA 2 3 1
Example
> df2[complete.cases(df2[c(1,3)]),]
Output
y1 y2 y3 y4 1 0 2 0 NA 3 0 9 1 1 5 2 2 0 2 7 6 2 0 1 8 2 4 1 NA 9 2 2 1 1 10 6 4 1 2 11 2 2 0 NA 12 6 2 3 1 13 0 4 1 1 14 2 4 1 0 15 2 9 0 1 16 2 2 1 1 18 2 9 0 1 19 2 9 1 0
- Related Questions & Answers
- How to subset an R data frame by specifying columns that contains NA?
- How to remove rows that contains all zeros in an R data frame?
- How to remove rows that contains coded missing value for all columns in an R data frame?
- How to remove rows from data frame in R that contains NaN?
- How to select rows of an R data frame that are non-NA?
- How to remove rows in R data frame that contains a specific number?
- How to remove rows from an R data frame that contains at least one NaN?
- How to replace NA values in columns of an R data frame form the mean of that column?
- How to remove NA’s from an R data frame that contains them at different places?
- How to extract columns based on particular column values of an R data frame that match\na pattern?
- How to subset rows of an R data frame if any columns have values greater than a certain value?
- How to subset rows of an R data frame if all columns have values greater than a certain value
- How to subset rows that do not contain NA and blank in one of the columns in an R data frame?
- How to remove rows in an R data frame column that has duplicate values greater than or equal to a certain number of times?
- How to convert columns of an R data frame into rows?
Advertisements