- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to replace missing values in a column with corresponding values in other column of an R data frame?
Often, we get missing data for analysis and we need to replace those missing values with something. Sometimes, we might want to replace them with the corresponding values in other column especially in situations when both the columns depict similar characteristics. This type of replacement can be easily done with the help of mutate function of dplyr package as shown in the below examples.
Example1
Consider the below data frame:
> set.seed(214) > x1<-rpois(20,5) > x2<-sample(c(NA,24,38,75),20,replace=TRUE) > df1<-data.frame(x1,x2) > df1
Output
x1 x2 1 4 75 2 8 24 3 5 38 4 4 38 5 7 NA 6 6 24 7 10 75 8 4 75 9 6 38 10 4 75 11 6 NA 12 6 75 13 7 38 14 3 24 15 6 NA 16 4 NA 17 1 NA 18 4 NA 19 10 NA 20 4 24
Loading dplyr package then replacing NAs in x2 with the corresponding values in x1 and creating a new column to store them:
Example
> library(dplyr) > df1 %>% mutate(x3=coalesce(x2,x1))
Output
x1 x2 x3 1 4 75 75 2 8 24 24 3 5 38 38 4 4 38 38 5 7 NA 7 6 6 24 24 7 10 75 75 8 4 75 75 9 6 38 38 10 4 75 75 11 6 NA 6 12 6 75 75 13 7 38 38 14 3 24 24 15 6 NA 6 16 4 NA 4 17 1 NA 1 18 4 NA 4 19 10 NA 10 20 4 24 24
Example2
> y1<-LETTERS[1:20] > y2<-sample(c(NA,"E","G","J"),20,replace=TRUE) > df2<-data.frame(y1,y2) > df2
Output
y1 y2 1 A E 2 B J 3 C E 4 D G 5 E E 6 F E 7 G J 8 H 9 I J 10 J E 11 K G 12 L J 13 M 14 N J 15 O G 16 P E 17 Q 18 R J 19 S J 20 T E
Replacing NAs in y2 with the corresponding values in y1 and creating a new column to store them:
Example
> df2 %>% mutate(y3=coalesce(y2,y1))
Output
y1 y2 y3 1 A E E 2 B J J 3 C E E 4 D G G 5 E E E 6 F E E 7 G J J 8 H H 9 I J J 10 J E E 11 K G G 12 L J J 13 M M 14 N J J 15 O G G 16 P E E 17 Q Q 18 R J J 19 S J J 20 T E E
Example3
> z1<-rnorm(20) > z2<-sample(c(NA,rnorm(5,2,1)),20,replace=TRUE) > df3<-data.frame(z1,z2) > df3
Output
z1 z2 1 -0.17901883 2.472282 2 -1.08901004 3.952225 3 0.42542338 NA 4 -0.60317450 NA 5 -0.03915205 1.437245 6 -0.16003909 3.952225 7 -0.46318505 2.472282 8 -1.42080605 1.372504 9 0.15067345 NA 10 0.19228563 2.472282 11 0.51799822 2.472282 12 0.50620920 2.472282 13 -0.21986645 1.437245 14 -0.17156659 1.372504 15 -1.28965823 2.472282 16 0.94734362 2.520788 17 1.78005478 NA 18 0.62910182 1.372504 19 -0.37591062 2.472282 20 -0.80137780 2.520788
Example
> df3 %>% mutate(z3=coalesce(z2,z1))
Output
z1 z2 z3 1 -0.17901883 2.472282 2.4722816 2 -1.08901004 3.952225 3.9522252 3 0.42542338 NA 0.4254234 4 -0.60317450 NA -0.6031745 5 -0.03915205 1.437245 1.4372447 6 -0.16003909 3.952225 3.9522252 7 -0.46318505 2.472282 2.4722816 8 -1.42080605 1.372504 1.3725039 9 0.15067345 NA 0.1506735 10 0.19228563 2.472282 2.4722816 11 0.51799822 2.472282 2.4722816 12 0.50620920 2.472282 2.4722816 13 -0.21986645 1.437245 1.4372447 14 -0.17156659 1.372504 1.3725039 15 -1.28965823 2.472282 2.4722816 16 0.94734362 2.520788 2.5207880 17 1.78005478 NA 1.7800548 18 0.62910182 1.372504 1.3725039 19 -0.37591062 2.472282 2.4722816 20 -0.80137780 2.520788 2.5207880
- Related Articles
- How to replace missing values with median in an R data frame column?
- How to randomly replace values in an R data frame column?
- How to replace NA with 0 and other values to 1 in an R data frame column?
- Replace numerical column values based on character column values in R data frame.
- How to replace missing values with row means in an R data frame?
- How to find the sum of non-missing values in an R data frame column?
- How to fill NA values with previous values in an R data frame column?
- How to find the percentage of missing values in each column of an R data frame?
- Find the frequency of unique values and missing values for each column in an R data frame.
- How to repeat column values in R data frame by values in another column?
- How to select positive values in an R data frame column?
- How to divide row values of a numerical column based on categorical column values in an R data frame?
- How to concatenate column values and create a new column in an R data frame?
- How to change row values based on column values in an R data frame?
- How to create a column with the serial number of values in character column of an R data frame?

Advertisements