How to fill the NA with last observation in the column of an R data frame?


There are multiple ways to fill missing values in data analysis and one of the ways is filling them with the previous value in the same column of the data frame. For example, if we have a column x in data frame df and this columns x contains some NA values then we can fill them with the values in the upper row. This can be done with the help of na.locf function of zoo package.

Consider the below data frame −

Example

 Live Demo

set.seed(477)
x<-sample(c(0,1,NA),20,replace=TRUE)
y<-sample(c(0:2,NA),20,replace=TRUE)
z<-sample(c(0:5,NA),20,replace=TRUE)
a<-sample(c(7,11,13,NA),20,replace=TRUE)
b<-sample(c(51,NA),20,replace=TRUE)
c<-sample(c(rnorm(2,1,0.05),NA),20,replace=TRUE)
df<-data.frame(x,y,z,a,b,c)
df

Output

  x y z a b c
1 1 1 3 13 51 1.011752
2 NA 1 1 NA NA 1.011752
3 0 0 2 11 NA 1.092852
4 NA 0 4 7 51 1.011752
5 NA 1 5 11 51 NA
6 1 NA 0 7 51 NA
7 0 1 5 11 51 1.092852
8 0 0 5 7 NA 1.011752
9 1 1 3 NA NA 1.092852
10 1 NA 2 7 51 1.011752
11 0 1 3 NA 51 1.011752
12 NA 1 4 11 51 NA
13 1 NA 3 NA NA 1.011752
14 NA 0 5 11 51 1.011752
15 0 NA 0 NA NA NA
16 NA 0 3 7 NA 1.092852
17 NA NA NA NA 51 NA
18 NA 1 3 11 51 1.011752
19 NA NA 0 11 51 NA
20 NA NA 0 11 NA NA

Loading zoo package and replacing the NA values with last observation in the same column −

Example

library(zoo)
na.locf(df)

Output

  x  y z  a  b     c
2 1 1 4  7  51  1.031294
3 0 2 3  7  51  1.031294
4 0 2 2  13  51  1.031294
5 0 0 3  13  51  0.954332
6 0 0 2  13  51  1.031294
7 1 2 2  13  51  1.031294
8 1 1 5  13  51  0.954332
9 1 1 3  7  51   1.031294
10 1 1 3  7  51  0.954332
11 0 1 0  7  51  0.954332
12 0 1 2  13 51 0.954332
13 1 1 4  7  51  0.954332
14 0 2 4  7  51  0.954332
15 0 0 0  11  51  0.954332
16 0 0 1  13  51 0.954332
17 0 0 0  13  51  0.954332
18 0 2 2  11  51  1.031294
19 0 2 3  7  51   0.954332
20 1 2 3  7  51   0.954332

Let’s have a look at another example −

Example

 Live Demo

v1<-sample(c(rexp(5,1),NA),20,replace=TRUE)
v2<-sample(c(runif(5,1,2),NA),20,replace=TRUE)
v3<-sample(c(rnorm(4,0.95,0.04),NA),20,replace=TRUE)
df_v<-data.frame(v1,v2,v3)
df_v

Output

    v1        v2         v3
1  0.3197994  1.664430  0.9608500
2  0.7260356  1.951135  0.9741401
3  0.2851354  1.951135  0.9741401
4  NA         1.354400  0.9155426
5  0.4840855  1.951135  0.9155426
6  0.7260356  1.927019  0.9155426
7  0.3197994  1.602498  0.9608500
8  0.3197994  1.602498  NA
9  0.7260356  1.951135  NA
10 0.4840855  1.354400  NA
11 NA         1.664430  NA
12 0.7260356  1.927019  NA
13 0.3197994  1.951135  0.9741401
14 0.2851354  1.354400  0.9155426
15 2.3741214  1.602498  0.9290660
16 0.3197994  1.354400  0.9290660
17 0.7260356  1.951135  0.9155426
18 0.3197994  1.354400  0.9608500
19 0.7260356  1.664430  0.9290660
20 NA         1.602498  NA

Replacing the NA values with last observation in the same column −

Example

na.locf(df_v)

Output

       v1       v2      v3
1  0.1109255 1.687216 0.9349647
2  0.1109255 1.687216 0.9611297
3  0.1862784 1.025610 0.9405675
4  0.1862784 1.278997 0.9517017
5  0.1109255 1.341934 0.9517017
6  0.1862784 1.687216 0.9517017
7  0.3978860 1.687216 0.9349647
8  0.1862784 1.025610 0.9517017
9 0.6972909 1.687216 0.9405675
10 0.1109255 1.199711 0.9405675
11 0.1109255 1.199711 0.9611297
12 0.6972909 1.199711 0.9349647
13 0.1862784 1.199711 0.9611297
14 0.3978860 1.025610 0.9349647
15 0.6972909 1.341934 0.9349647
16 0.3978860 1.025610 0.9405675
17 0.6972909 1.199711 0.9517017
18 0.1862784 1.687216 0.9517017
19 0.1862784 1.025610 0.9405675
20 0.6972909 1.025610 0.9611297

Updated on: 14-Oct-2020

368 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements