- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to replace NA values in columns of an R data frame form the mean of that column?
In the whole world, the first step people teach to impute missing values is replacing them with the relevant mean. That means if we have a column which has some missing values then replace it with the mean of the remaining values. In R, we can do this by replacing the column with missing values using mean of that column and passing na.rm = TRUE argument along with the same.
Consider the below data frame −
Example
set.seed(121) x<-sample(c(0:2,NA),20,replace=TRUE) y<-sample(c(0:10,NA),20,replace=TRUE) z<-sample(c(rnorm(2,1,0.40),NA),20,replace=TRUE) df<-data.frame(x,y,z) df
Output
x y z 1 NA 1 1.525471 2 NA 10 1.525471 3 NA 0 NA 4 2 1 NA 5 NA 3 NA 6 0 4 1.525471 7 2 9 NA 8 0 5 NA 9 2 7 NA 10 2 6 1.296308 11 2 1 1.296308 12 0 NA 1.525471 13 NA 8 1.296308 14 0 5 NA 15 1 7 1.296308 16 NA 1 1.525471 17 0 1 NA 18 NA 5 1.525471 19 0 8 1.296308 20 1 1 1.296308
Replacing NA’s in column x with mean of the remaining values −
Example
df$x[is.na(df$x)]<-mean(df$x,na.rm=TRUE) df
Output
x y z 1 0.9230769 1 1.525471 2 0.9230769 10 1.525471 3 0.9230769 0 NA 4 2.0000000 1 NA 5 0.9230769 3 NA 6 0.0000000 4 1.525471 7 2.0000000 9 NA 8 0.0000000 5 NA 9 2.0000000 7 NA 10 2.0000000 6 1.296308 11 2.0000000 1 1.296308 12 0.0000000 NA 1.525471 13 0.9230769 8 1.296308 14 0.0000000 5 NA 15 1.0000000 7 1.296308 16 0.9230769 1 1.525471 17 0.0000000 1 NA 18 0.9230769 5 1.525471 19 0.0000000 8 1.296308 20 1.0000000 1 1.296308
Replacing NA’s in column y with mean of the remaining values −
Example
df$y[is.na(df$y)]<-mean(df$y,na.rm=TRUE) df
Output
x y z 1 0.9230769 1.000000 1.525471 2 0.9230769 10.000000 1.525471 3 0.9230769 0.000000 NA 4 2.0000000 1.000000 NA 5 0.9230769 3.000000 NA 6 0.0000000 4.000000 1.525471 7 2.0000000 9.000000 NA 8 0.0000000 5.000000 NA 9 2.0000000 7.000000 NA 10 2.0000000 6.000000 1.296308 11 2.0000000 1.000000 1.296308 12 0.0000000 4.368421 1.525471 13 0.9230769 8.000000 1.296308 14 0.0000000 5.000000 NA 15 1.0000000 7.000000 1.296308 16 0.9230769 1.000000 1.525471 17 0.0000000 1.000000 NA 18 0.9230769 5.000000 1.525471 19 0.0000000 8.000000 1.296308 20 1.0000000 1.000000 1.296308
Replacing NA’s in column z with mean of the remaining values −
Example
df$z[is.na(df$z)]<-mean(df$z,na.rm=TRUE) df
Output
x y z 1 0.9230769 1.000000 1.525471 2 0.9230769 10.000000 1.525471 3 0.9230769 0.000000 1.410890 4 2.0000000 1.000000 1.410890 5 0.9230769 3.000000 1.410890 6 0.0000000 4.000000 1.525471 7 2.0000000 9.000000 1.410890 8 0.0000000 5.000000 1.410890 9 2.0000000 7.000000 1.410890 10 2.0000000 6.000000 1.296308 11 2.0000000 1.000000 1.296308 12 0.0000000 4.368421 1.525471 13 0.9230769 8.000000 1.296308 14 0.0000000 5.000000 1.410890 15 1.0000000 7.000000 1.296308 16 0.9230769 1.000000 1.525471 17 0.0000000 1.000000 1.410890 18 0.9230769 5.000000 1.525471 19 0.0000000 8.000000 1.296308 20 1.0000000 1.000000 1.296308
- Related Articles
- How to extract columns based on particular column values of an R data frame that match\na pattern?
- How to remove rows that contains NA values in certain columns of an R data frame?
- How to replace NA values with zeros in an R data frame?
- How to replace NA with 0 and other values to 1 in an R data frame column?
- How to randomly replace values in an R data frame column?
- How to fill NA values with previous values in an R data frame column?
- How to fill the missing values of an R data frame from the mean of columns?
- How to subset an R data frame by specifying columns that contains NA?
- How to replace missing values in a column with corresponding values in other column of an R data frame?
- How to replace missing values with median in an R data frame column?
- How to find the column mean by excluding NA’s and if all values are NA then output NA in R data frame?
- How to find the mean of a numerical column by two categorical columns in an R data frame?
- How to find the mean of all values in an R data frame?
- How to set NA values to TRUE for a Boolean column in an R data frame?
- How to convert empty values to NA in an R data frame?

Advertisements