How to replace missing values with median in an R data frame column?

R ProgrammingServer Side ProgrammingProgramming

To replace missing values with median, we can use the same trick that is used to replace missing values with mean. For example, if we have a data frame df that contain columns x and y where both of the columns contains some missing values then the missing values can be replaced with median as df$x[is.na(df$x)]<-median(df$x,na.rm=TRUE) for x and for y we can do the same as df$y[is.na(df$y)]<-median(df$y,na.rm=TRUE).

Example

 Live Demo

Consider the below data frame −

set.seed(1112)
x1<-LETTERS[1:20]
x2<-sample(c(NA,rpois(19,8)),20,replace=TRUE)
df1<-data.frame(x1,x2)
df1

Output

 x1 x2
1 A 10
2 B 11
3 C 8
4 D 6
5 E 6
6 F NA
7 G 10
8 H 8
9 I 8
10 J 7
11 K NA
12 L 12
13 M 7
14 N 6
15 O 10
16 P 7
17 Q 7
18 R 8
19 S 11
20 T 4
median(df1$x2)
[1] 8

Replacing missing values in x2 with median of the remaining values −

df1$x2[is.na(df1$x2)]<-median(df1$x2,na.rm=TRUE)
df1

Output

x1 x2
1 A 10
2 B 11
3 C 8
4 D 6
5 E 6
6 F 8
7 G 10
8 H 8
9 I 8
10 J 7
11 K 8
12 L 12
13 M 7
14 N 6
15 O 10
16 P 7
17 Q 7
18 R 8
19 S 11
20 T 4

Let’s have a look at another example −

Example

ID<-1:20 Ratings<-sample(c(NA,1,2,3,4,5),20,replace=TRUE) df2<-data.frame(ID,Ratings) df2

Output

ID Ratings
1 1 3
2 2 1
3 3 1
4 4 4
5 5 1
6 6 4
7 7 2
8 8 3
9 9 2
10 10 2
11 11 3
12 12 5
13 13 5
14 14 1
15 15 4
16 16 1
17 17 4
18 18 NA
19 19 1
20 20 NA
median(df2$Ratings,na.rm=TRUE)
[1] 2.5

Replacing missing values in Ratings with median of the remaining values −

Example

df2$Ratings[is.na(df2$Ratings)]<-median(df2$Ratings,na.rm=TRUE)
df2

Output

ID Ratings
1 1 3.0
2 2 1.0
3 3 1.0
4 4 4.0
5 5 1.0
6 6 4.0
7 7 2.0
8 8 3.0
9 9 2.0
10 10 2.0
11 11 3.0
12 12 5.0
13 13 5.0
14 14 1.0
15 15 4.0
16 16 1.0
17 17 4.0
18 18 2.5
19 19 1.0
20 20 2.5
raja
Published on 17-Oct-2020 17:24:14
Advertisements