- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to remove rows based on blanks in a column from a data frame in R?
Sometimes data is incorrectly entered into systems and that is the reason we must be careful while doing data cleaning before proceeding to analysis. A data collector or the sampled unit might enter blank to an answer if he or she does not find an appropriate option for the question. This also happens if the questionnaire is not properly designed or blank is filled by mistake. Also, if we have categorical variable then a control category might be filled with blank or we may want to have a blank category to use a new one at later stage. Whatever the reason behind, an analyst faces such type of problems. These blanks are actually inserted by using space key on computers. Therefore, if a data frame has any column with blank values then those rows can be removed by using subsetting with single square brackets.
Example1
Consider the below data frame:
> set.seed(24) > x1<-sample(c(" ",1:5),20,replace=TRUE) > x2<-rnorm(20,4,1.25) > df1<-data.frame(x1,x2) > df1
Output
x1 x2 1 2 3.413674 2 1 3.581267 3 2 5.920315 4 4 4.762493 5 1 4.645420 6 5 3.907114 7 1 3.243554 8 1.862944 9 3 3.664134 10 3.189261 11 3.882362 12 4 3.893074 13 4 4.149414 14 3.854630 15 4 2.820216 16 4 3.957828 17 3 3.268216 18 4 4.766064 19 1 5.896403 20 4.821726
Removing rows with blanks:
Example
> df1[!df1$x1==" ",]
Output
x1 x2 1 2 3.413674 2 1 3.581267 3 2 5.920315 4 4 4.762493 5 1 4.645420 6 5 3.907114 7 1 3.243554 9 3 3.664134 12 4 3.893074 13 4 4.149414 15 4 2.820216 16 4 3.957828 17 3 3.268216 18 4 4.766064 19 1 5.896403
Example2
> y1<-sample(c(" ",rpois(5,1)),20,replace=TRUE) > y2<-rpois(20,5) > df2<-data.frame(y1,y2) > df2
Output
y1 y2 1 1 2 2 0 4 3 3 4 10 5 0 6 6 0 5 7 0 7 8 0 3 9 1 1 10 1 6 11 2 7 12 2 5 13 0 5 14 3 15 0 5 16 0 3 17 1 4 18 0 4 19 2 2 20 14
Removing rows with blanks:
Example
> df2[!df2$y1==" ",]
Output
y1 y2 1 1 2 2 0 4 5 0 6 6 0 5 7 0 7 8 0 3 9 1 1 10 1 6 11 2 7 12 2 5 13 0 5 15 0 5 16 0 3 17 1 4 18 0 4 19 2 2