- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to fill the missing values of an R data frame from the mean of columns?
Dealing with missing values is one of the initial steps in data analysis and it is also most difficult because we don’t fill the missing values with the appropriate method then the result of the whole analysis might become meaningless. Therefore, we must be very careful about dealing with missing values. Mostly for learning purposes, people use mean to fill the missing values but can use many other values depending on our data characteristic. To fill the missing value with mean of columns, we can use na.aggregate function of zoo package.
Example
Consider the below data frame −
x1<-c(1:5,NA,17:30) x2<-c(1:2,NA,4:20) x3<-sample(c(1,5,8,NA,6,3),20,replace=TRUE) x4<-sample(c(45,75,68,NA,36,43),20,replace=TRUE) x5<-rep(c(23,45,55,78,NA),times=4) df<-data.frame(x1,x2,x3,x4,x5) df
Output
x1 x2 x3 x4 x5 1 1 1 6 36 23 2 2 2 5 36 45 3 3 NA 1 68 55 4 4 4 1 43 78 5 5 5 3 45 NA 6 NA 6 3 75 23 7 17 7 5 68 45 8 18 8 6 43 55 9 19 9 NA 75 78 10 20 10 8 75 NA 11 21 11 3 43 23 12 22 12 1 68 45 13 23 13 8 45 55 14 24 14 5 36 78 15 25 15 5 36 NA 16 26 16 5 75 23 17 27 17 5 75 45 18 28 18 6 43 55 19 29 19 8 NA 78 20 30 20 6 75 NA
Example
library(zoo) na.aggregate(df)
Output
x1 x2 x3 x4 x5 1 1.00000 1.00000 6.000000 36.00000 23.00 2 2.00000 2.00000 5.000000 36.00000 45.00 3 3.00000 10.89474 1.000000 68.00000 55.00 4 4.00000 4.00000 1.000000 43.00000 78.00 5 5.00000 5.00000 3.000000 45.00000 50.25 6 18.10526 6.00000 3.000000 75.00000 23.00 7 17.00000 7.00000 5.000000 68.00000 45.00 8 18.00000 8.00000 6.000000 43.00000 55.00 9 19.00000 9.00000 4.736842 75.00000 78.00 10 20.00000 10.00000 8.000000 75.00000 50.25 11 21.00000 11.00000 3.000000 43.00000 23.00 12 22.00000 12.00000 1.000000 68.00000 45.00 13 23.00000 13.00000 8.000000 45.00000 55.00 14 24.00000 14.00000 5.000000 36.00000 78.00 15 25.00000 15.00000 5.000000 36.00000 50.25 16 26.00000 16.00000 5.000000 75.00000 23.00 17 27.00000 17.00000 5.000000 75.00000 45.00 18 28.00000 18.00000 6.000000 43.00000 55.00 19 29.00000 19.00000 8.000000 55.78947 78.00 20 30.00000 20.00000 6.000000 75.00000 50.25
Let’s have a look at another example −
Example
var1 <-sample(c(1,2,NA),20,replace=TRUE) var2 <-sample(c(2,NA),20,replace=TRUE) var3 <-c(rnorm(10),rep(NA,10)) var_data <-data.frame(var1,var2,var3) var_data
Output
var1 var2 var3 1 1 NA 0.15883062 2 NA 2 0.65976414 3 NA 2 2.22051966 4 NA NA -1.18394507 5 1 NA -0.07395583 6 NA 2 -0.41635467 7 NA NA -0.19148234 8 NA NA 0.06954478 9 1 2 1.15534832 10 2 2 0.59495735 11 2 NA NA 12 1 2 NA 13 NA 2 NA 14 NA NA NA 15 1 NA NA 16 1 NA NA 17 1 2 NA 18 NA 2 NA 19 1 2 NA 20 2 NA NA
Example
na.aggregate(var_data)
Output
var1 var2 var3 1 1.000000 2 0.15883062 2 1.272727 2 0.65976414 3 1.272727 2 2.22051966 4 1.272727 2 -1.18394507 5 1.000000 2 -0.07395583 6 1.272727 2 -0.41635467 7 1.272727 2 -0.19148234 8 1.272727 2 0.06954478 9 1.000000 2 1.15534832 10 2.000000 2 0.59495735 11 2.000000 2 0.29932269 12 1.000000 2 0.29932269 13 1.272727 2 0.29932269 14 1.272727 2 0.29932269 15 1.000000 2 0.29932269 16 1.000000 2 0.29932269 17 1.000000 2 0.29932269 18 1.272727 2 0.29932269 19 1.000000 2 0.29932269 20 2.000000 2 0.29932269
- Related Articles
- How to find the row mean for columns in an R data frame by ignoring missing values?
- How to create a new data frame for the mean of rows of some columns from an R data frame?
- How to find the percentage of missing values in an R data frame?
- How to replace NA values in columns of an R data frame form the mean of that column?
- How to fill the NA values from above row values in an R data frame?
- How to find the mean of all values in an R data frame?
- How to find the mean of columns of an R data frame or a matrix?
- How to find the number of groupwise missing values in an R data frame?
- How to find the sum of non-missing values in an R data frame column?
- How to find the percentage of missing values in each column of an R data frame?
- How to find the mean of row values in an R data frame using dplyr?
- How to change the position of missing values to the end of data frame in R?
- How to find the minimum and maximum of columns values in an R data frame?
- Roll up R data frame columns for summation by group if missing values exist in the data frame.
- How to find the class of columns of an R data frame?

Advertisements