- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to change the position of missing values to the end of data frame in R?
Most of the times we need to deal with missing values in data science projects and these missing values can be occurred at any position. We might want to change the position of these missing values and send them to the end of the columns in the data frame. This can be done with the help of lapply function as shown in the below examples.
Example1
Consider the below data frame −
> x1<-sample(c(NA,rpois(2,1)),20,replace=TRUE) > x2<-sample(c(NA,rpois(2,1)),20,replace=TRUE) > x3<-sample(c(NA,rpois(2,1)),20,replace=TRUE) > df1<-data.frame(x1,x2,x3) > df1
Output
x1 x2 x3 1 0 0 2 2 1 1 NA 3 1 NA 0 4 0 NA 2 5 1 NA 2 6 NA NA NA 7 0 1 2 8 0 1 NA 9 1 0 2 10 0 1 0 11 0 1 2 12 0 1 2 13 0 1 NA 14 1 1 2 15 0 1 NA 16 NA 1 2 17 1 0 0 18 NA 1 NA 19 NA 0 NA 20 0 0 2
Sending the missing values to the end of df1 −
> df1[]<-lapply(df1,function(x) c(x[!is.na(x)], x[is.na(x)])) > df1
Output
x1 x2 x3 1 0 0 2 2 1 1 0 3 1 1 2 4 0 1 2 5 1 0 2 6 0 1 2 7 0 1 0 8 1 1 2 9 0 1 2 10 0 1 2 11 0 1 2 12 0 1 0 13 1 0 2 14 0 1 NA 15 1 0 NA 16 0 0 NA 17 NA NA NA 18 NA NA NA 19 NA NA NA 20 NA NA NA
Example2
> y1<-sample(c(NA,rnorm(3)),20,replace=TRUE) > y2<-sample(c(NA,rnorm(3)),20,replace=TRUE) > y3<-sample(c(NA,rnorm(3)),20,replace=TRUE) > df2<-data.frame(y1,y2,y3) > df2
Output
y1 y2 y3 1 NA NA NA 2 1.3030960 NA 0.6250597 3 NA -0.2795437 NA 4 NA -0.2795437 0.6250597 5 NA 1.2997792 0.6250597 6 1.3030960 1.2997792 NA 7 0.5949615 -0.2795437 NA 8 0.1149380 NA 0.6250597 9 0.5949615 -0.2795437 NA 10 NA NA -0.6412672 11 NA 1.2997792 -0.6412672 12 1.3030960 -0.2795437 -0.6412672 13 0.1149380 1.2997792 NA 14 1.3030960 -0.2795437 NA 15 0.1149380 -0.2795437 0.6250597 16 0.1149380 NA 1.3271716 17 NA -0.2795437 -0.6412672 18 NA 1.2481138 NA 19 0.5949615 1.2481138 NA 20 NA -0.2795437 0.6250597
Sending the missing values to the end of df2 −
> df2[]<-lapply(df2,function(x) c(x[!is.na(x)], x[is.na(x)])) > df2
Output
y1 y2 y3 1 1.3030960 -0.2795437 0.6250597 2 1.3030960 -0.2795437 0.6250597 3 0.5949615 1.2997792 0.6250597 4 0.1149380 1.2997792 0.6250597 5 0.5949615 -0.2795437 -0.6412672 6 1.3030960 -0.2795437 -0.6412672 7 0.1149380 1.2997792 -0.6412672 8 1.3030960 -0.2795437 0.6250597 9 0.1149380 1.2997792 1.3271716 10 0.1149380 -0.2795437 -0.6412672 11 0.5949615 -0.2795437 0.6250597 12 NA -0.2795437 NA 13 NA 1.2481138 NA 14 NA 1.2481138 NA 15 NA -0.2795437 NA 16 NA NA NA 17 NA NA NA 18 NA NA NA 19 NA NA NA 20 NA NA NA
Advertisements