- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to fill the missing values of an R data frame from the mean of columns?
Dealing with missing values is one of the initial steps in data analysis and it is also most difficult because we don’t fill the missing values with the appropriate method then the result of the whole analysis might become meaningless. Therefore, we must be very careful about dealing with missing values. Mostly for learning purposes, people use mean to fill the missing values but can use many other values depending on our data characteristic. To fill the missing value with mean of columns, we can use na.aggregate function of zoo package.
Example
Consider the below data frame −
x1<-c(1:5,NA,17:30) x2<-c(1:2,NA,4:20) x3<-sample(c(1,5,8,NA,6,3),20,replace=TRUE) x4<-sample(c(45,75,68,NA,36,43),20,replace=TRUE) x5<-rep(c(23,45,55,78,NA),times=4) df<-data.frame(x1,x2,x3,x4,x5) df
Output
x1 x2 x3 x4 x5 1 1 1 6 36 23 2 2 2 5 36 45 3 3 NA 1 68 55 4 4 4 1 43 78 5 5 5 3 45 NA 6 NA 6 3 75 23 7 17 7 5 68 45 8 18 8 6 43 55 9 19 9 NA 75 78 10 20 10 8 75 NA 11 21 11 3 43 23 12 22 12 1 68 45 13 23 13 8 45 55 14 24 14 5 36 78 15 25 15 5 36 NA 16 26 16 5 75 23 17 27 17 5 75 45 18 28 18 6 43 55 19 29 19 8 NA 78 20 30 20 6 75 NA
Example
library(zoo) na.aggregate(df)
Output
x1 x2 x3 x4 x5 1 1.00000 1.00000 6.000000 36.00000 23.00 2 2.00000 2.00000 5.000000 36.00000 45.00 3 3.00000 10.89474 1.000000 68.00000 55.00 4 4.00000 4.00000 1.000000 43.00000 78.00 5 5.00000 5.00000 3.000000 45.00000 50.25 6 18.10526 6.00000 3.000000 75.00000 23.00 7 17.00000 7.00000 5.000000 68.00000 45.00 8 18.00000 8.00000 6.000000 43.00000 55.00 9 19.00000 9.00000 4.736842 75.00000 78.00 10 20.00000 10.00000 8.000000 75.00000 50.25 11 21.00000 11.00000 3.000000 43.00000 23.00 12 22.00000 12.00000 1.000000 68.00000 45.00 13 23.00000 13.00000 8.000000 45.00000 55.00 14 24.00000 14.00000 5.000000 36.00000 78.00 15 25.00000 15.00000 5.000000 36.00000 50.25 16 26.00000 16.00000 5.000000 75.00000 23.00 17 27.00000 17.00000 5.000000 75.00000 45.00 18 28.00000 18.00000 6.000000 43.00000 55.00 19 29.00000 19.00000 8.000000 55.78947 78.00 20 30.00000 20.00000 6.000000 75.00000 50.25
Let’s have a look at another example −
Example
var1 <-sample(c(1,2,NA),20,replace=TRUE) var2 <-sample(c(2,NA),20,replace=TRUE) var3 <-c(rnorm(10),rep(NA,10)) var_data <-data.frame(var1,var2,var3) var_data
Output
var1 var2 var3 1 1 NA 0.15883062 2 NA 2 0.65976414 3 NA 2 2.22051966 4 NA NA -1.18394507 5 1 NA -0.07395583 6 NA 2 -0.41635467 7 NA NA -0.19148234 8 NA NA 0.06954478 9 1 2 1.15534832 10 2 2 0.59495735 11 2 NA NA 12 1 2 NA 13 NA 2 NA 14 NA NA NA 15 1 NA NA 16 1 NA NA 17 1 2 NA 18 NA 2 NA 19 1 2 NA 20 2 NA NA
Example
na.aggregate(var_data)
Output
var1 var2 var3 1 1.000000 2 0.15883062 2 1.272727 2 0.65976414 3 1.272727 2 2.22051966 4 1.272727 2 -1.18394507 5 1.000000 2 -0.07395583 6 1.272727 2 -0.41635467 7 1.272727 2 -0.19148234 8 1.272727 2 0.06954478 9 1.000000 2 1.15534832 10 2.000000 2 0.59495735 11 2.000000 2 0.29932269 12 1.000000 2 0.29932269 13 1.272727 2 0.29932269 14 1.272727 2 0.29932269 15 1.000000 2 0.29932269 16 1.000000 2 0.29932269 17 1.000000 2 0.29932269 18 1.272727 2 0.29932269 19 1.000000 2 0.29932269 20 2.000000 2 0.29932269
Advertisements