- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to fill the NA values from above row values in an R data frame?
Sometimes we have missing values that can be replaced with the values on the above row values, it often happens in situations when the data is recorded manually and the person responsible for it just mention the unique values because he or she understand the data characteristics. But if this data needs to be re-used by someone else then it does not make sense and we have to connect with the concerned person. If the concerned person tells us that the first value in each row can be filled for every NA in the same column then it can be done by using match function.
Example
Consider the below data frame −
x1<-c(rep(1,3),rep(2,5),rep(3,8),rep(4,4))x2<-c(12,NA,NA,15,NA,NA,NA,NA,14,NA,NA,NA,NA,NA,NA,NA,16,NA,NA,NA) df1<-data.frame(x1,x2) df1
Output
x1 x2 1 1 12 2 1 NA 3 1 NA 4 2 15 5 2 NA 6 2 NA 7 2 NA 8 2 NA 9 3 14 10 3 NA 11 3 NA 12 3 NA 13 3 NA 14 3 NA 15 3 NA 16 3 NA 17 4 16 18 4 NA 19 4 NA 20 4 NA
Replacing NA’s in column 2 with the value in the above row −
df1$x2<-df1$x2[match(df1$x1,df1$x1)] df1
Output
x1 x 1 1 12 2 1 12 3 1 12 4 2 15 5 2 15 6 2 15 7 2 15 8 2 15 9 3 14 10 3 14 11 3 14 12 3 14 13 3 14 14 3 14 15 3 14 16 3 14 17 4 16 18 4 16 19 4 16 20 4 16
Let’s have a look at another example −
y1<-c(rep("A",4),rep("B",4),rep("C",4),rep("D",4),rep("E",4)) y2<-1:20 y3<-c(123,NA,NA,NA,140,NA,NA,NA,142,NA,NA,NA,137,NA,NA,NA,16,NA,NA,NA) df2<-data.frame(y1,y2,y3) df2
Output
y1 y2 y3 1 A 1 123 2 A 2 NA 3 A 3 NA 4 A 4 NA 5 B 5 140 6 B 6 NA 7 B 7 NA 8 B 8 NA 9 C 9 142 10 C 10 NA 11 C 11 NA 12 C 12 NA 13 D 13 137 14 D 14 NA 15 D 15 NA 16 D 16 NA 17 E 17 16 18 E 18 NA 19 E 19 NA 20 E 20 NA
Replacing NA’s in column 3 with the value in the above row −
df2$y3<-df2$y3[match(df2$y1,df2$y1)] df2
Output
y1 y2 y3 1 A 1 123 2 A 2 123 3 A 3 123 4 A 4 123 5 B 5 140 6 B 6 140 7 B 7 140 8 B 8 140 9 C 9 142 10 C 10 142 11 C 11 142 12 C 12 142 13 D 13 137 14 D 14 137 15 D 15 137 16 D 16 137 17 E 17 16 18 E 18 16 19 E 19 16 20 E 20 16