How to replace NA values in columns of an R data frame form the mean of that column?


In the whole world, the first step people teach to impute missing values is replacing them with the relevant mean. That means if we have a column which has some missing values then replace it with the mean of the remaining values. In R, we can do this by replacing the column with missing values using mean of that column and passing na.rm = TRUE argument along with the same.

Consider the below data frame −

Example

 Live Demo

set.seed(121)
x<-sample(c(0:2,NA),20,replace=TRUE)
y<-sample(c(0:10,NA),20,replace=TRUE)
z<-sample(c(rnorm(2,1,0.40),NA),20,replace=TRUE)
df<-data.frame(x,y,z)
df

Output

x y z
1 NA 1 1.525471
2 NA 10 1.525471
3 NA 0 NA
4 2 1 NA
5 NA 3 NA
6 0 4 1.525471
7 2 9 NA
8 0 5 NA
9 2 7 NA
10 2 6 1.296308
11 2 1 1.296308
12 0 NA 1.525471
13 NA 8 1.296308
14 0 5 NA
15 1 7 1.296308
16 NA 1 1.525471
17 0 1 NA
18 NA 5 1.525471
19 0 8 1.296308
20 1 1 1.296308

Replacing NA’s in column x with mean of the remaining values −

Example

df$x[is.na(df$x)]<-mean(df$x,na.rm=TRUE)
df

Output

  x y z
1 0.9230769 1 1.525471
2 0.9230769 10 1.525471
3 0.9230769 0 NA
4 2.0000000 1 NA
5 0.9230769 3 NA
6 0.0000000 4 1.525471
7 2.0000000 9 NA
8 0.0000000 5 NA
9 2.0000000 7 NA
10 2.0000000 6 1.296308
11 2.0000000 1 1.296308
12 0.0000000 NA 1.525471
13 0.9230769 8 1.296308
14 0.0000000 5 NA
15 1.0000000 7 1.296308
16 0.9230769 1 1.525471
17 0.0000000 1 NA
18 0.9230769 5 1.525471
19 0.0000000 8 1.296308
20 1.0000000 1 1.296308

Replacing NA’s in column y with mean of the remaining values −

Example

df$y[is.na(df$y)]<-mean(df$y,na.rm=TRUE)
df

Output

      x y z
1 0.9230769 1.000000 1.525471
2 0.9230769 10.000000 1.525471
3 0.9230769 0.000000 NA
4 2.0000000 1.000000 NA
5 0.9230769 3.000000 NA
6 0.0000000 4.000000 1.525471
7 2.0000000 9.000000 NA
8 0.0000000 5.000000 NA
9 2.0000000 7.000000 NA
10 2.0000000 6.000000 1.296308
11 2.0000000 1.000000 1.296308
12 0.0000000 4.368421 1.525471
13 0.9230769 8.000000 1.296308
14 0.0000000 5.000000 NA
15 1.0000000 7.000000 1.296308
16 0.9230769 1.000000 1.525471
17 0.0000000 1.000000 NA
18 0.9230769 5.000000 1.525471
19 0.0000000 8.000000 1.296308
20 1.0000000 1.000000 1.296308

Replacing NA’s in column z with mean of the remaining values −

Example

df$z[is.na(df$z)]<-mean(df$z,na.rm=TRUE)
df

Output

      x y z
1 0.9230769 1.000000 1.525471
2 0.9230769 10.000000 1.525471
3 0.9230769 0.000000 1.410890
4 2.0000000 1.000000 1.410890
5 0.9230769 3.000000 1.410890
6 0.0000000 4.000000 1.525471
7 2.0000000 9.000000 1.410890
8 0.0000000 5.000000 1.410890
9 2.0000000 7.000000 1.410890
10 2.0000000 6.000000 1.296308
11 2.0000000 1.000000 1.296308
12 0.0000000 4.368421 1.525471
13 0.9230769 8.000000 1.296308
14 0.0000000 5.000000 1.410890
15 1.0000000 7.000000 1.296308
16 0.9230769 1.000000 1.525471
17 0.0000000 1.000000 1.410890
18 0.9230769 5.000000 1.525471
19 0.0000000 8.000000 1.296308
20 1.0000000 1.000000 1.296308

Updated on: 18-Oct-2020

7K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements