How to replace missing values in a column with corresponding values in other column of an R data frame?


Often, we get missing data for analysis and we need to replace those missing values with something. Sometimes, we might want to replace them with the corresponding values in other column especially in situations when both the columns depict similar characteristics. This type of replacement can be easily done with the help of mutate function of dplyr package as shown in the below examples.

Example1

Consider the below data frame:

Live Demo

> set.seed(214)
> x1<-rpois(20,5)
> x2<-sample(c(NA,24,38,75),20,replace=TRUE)
> df1<-data.frame(x1,x2)
> df1

Output

 x1 x2
1 4 75
2 8 24
3 5 38
4 4 38
5 7 NA
6 6 24
7 10 75
8 4 75
9 6 38
10 4 75
11 6 NA
12 6 75
13 7 38
14 3 24
15 6 NA
16 4 NA
17 1 NA
18 4 NA
19 10 NA
20 4 24

Loading dplyr package then replacing NAs in x2 with the corresponding values in x1 and creating a new column to store them:

Example

> library(dplyr)
> df1 %>% mutate(x3=coalesce(x2,x1))

Output

x1 x2 x3
1 4 75 75
2 8 24 24
3 5 38 38
4 4 38 38
5 7 NA 7
6 6 24 24
7 10 75 75
8 4 75 75
9 6 38 38
10 4 75 75
11 6 NA 6
12 6 75 75
13 7 38 38
14 3 24 24
15 6 NA 6
16 4 NA 4
17 1 NA 1
18 4 NA 4
19 10 NA 10
20 4 24 24

Example2

Live Demo

> y1<-LETTERS[1:20]
> y2<-sample(c(NA,"E","G","J"),20,replace=TRUE)
> df2<-data.frame(y1,y2)
> df2

Output

y1 y2
1 A E
2 B J
3 C E
4 D G
5 E E
6 F E
7 G J
8 H
9 I J
10 J E
11 K G
12 L J
13 M
14 N J
15 O G
16 P E
17 Q
18 R J
19 S J
20 T E

Replacing NAs in y2 with the corresponding values in y1 and creating a new column to store them:

Example

> df2 %>% mutate(y3=coalesce(y2,y1))

Output

y1 y2 y3
1 A E E
2 B J J
3 C E E
4 D G G
5 E E E
6 F E E
7 G J J
8 H H
9 I J J
10 J E E
11 K G G
12 L J J
13 M M
14 N J J
15 O G G
16 P E E
17 Q Q
18 R J J
19 S J J
20 T E E

Example3

Live Demo

> z1<-rnorm(20)
> z2<-sample(c(NA,rnorm(5,2,1)),20,replace=TRUE)
> df3<-data.frame(z1,z2)
> df3

Output

z1 z2
1 -0.17901883 2.472282
2 -1.08901004 3.952225
3 0.42542338 NA
4 -0.60317450 NA
5 -0.03915205 1.437245
6 -0.16003909 3.952225
7 -0.46318505 2.472282
8 -1.42080605 1.372504
9 0.15067345 NA
10 0.19228563 2.472282
11 0.51799822 2.472282
12 0.50620920 2.472282
13 -0.21986645 1.437245
14 -0.17156659 1.372504
15 -1.28965823 2.472282
16 0.94734362 2.520788
17 1.78005478 NA
18 0.62910182 1.372504
19 -0.37591062 2.472282
20 -0.80137780 2.520788

Example

> df3 %>% mutate(z3=coalesce(z2,z1))

Output

z1 z2 z3
1 -0.17901883 2.472282 2.4722816
2 -1.08901004 3.952225 3.9522252
3 0.42542338 NA 0.4254234
4 -0.60317450 NA -0.6031745
5 -0.03915205 1.437245 1.4372447
6 -0.16003909 3.952225 3.9522252
7 -0.46318505 2.472282 2.4722816
8 -1.42080605 1.372504 1.3725039
9 0.15067345 NA 0.1506735
10 0.19228563 2.472282 2.4722816
11 0.51799822 2.472282 2.4722816
12 0.50620920 2.472282 2.4722816
13 -0.21986645 1.437245 1.4372447
14 -0.17156659 1.372504 1.3725039
15 -1.28965823 2.472282 2.4722816
16 0.94734362 2.520788 2.5207880
17 1.78005478 NA 1.7800548
18 0.62910182 1.372504 1.3725039
19 -0.37591062 2.472282 2.4722816
20 -0.80137780 2.520788 2.5207880

Updated on: 19-Nov-2020

339 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements