How to replace missing values recorded with blank spaces in R with NA or any other value?


Sometimes when we read data in R, the missing values are recorded as blank spaces and it is difficult to replace them with any value. The reason behind this is we need to know how many spaces we have used in place of missing values. If we know that then assigning any value becomes easy.

Example

Consider the below data frame of vectors x and y.

> x<-c("", 3,2,1,2,3,2,1," ", 43, "")
> y<-c(1,2,"", 43,2," ", 3,2,3,"", 7)
> df<-data.frame(x,y)
> df
  x  y
1    1
2 3  2
3 2
4 1 43
5 2  2
6    3
7 2  3
8 1  2
9    3
10  43
11   7

Here, we have missing values recorded as blank spaces as well simply with double inverted commas. Now let’s replace these missing values with NA as shown below −

> df[df==""]<-NA
> df
    x   y
1 <NA>  1
2   3   2
3   2 <NA>
4   1  43
5   2   2
6       3
7   2   3
8   1   2
9       3
10 43 <NA>
11 <NA> 7

Here, the nineth value in x and sixth value in y are not replaced because the number of blank spaces, so we need to specify them. First, read the number of spaces by looking at the vectors as follows −

> x
[1] "" "3" "2" "1" "2" "3" "2" "1" " " "43" ""
> y
[1] "1"  "2"    ""    "43" "2"    " "    "3"    "2"
[9] "3"   ""    "7"

There seems to be one blank space for nineth value in x and five blank spaces in sixth value of y. Now let’s change the df for x as follows −

> df[df==" "]<-NA
> df
     x  y
1  <NA> 1
2   3   2
3   2 <NA>
4   1  43
5   2   2
6       3
7   2   3
8   1   2
9  <NA> 3
10 43 <NA>
11 <NA> 7

Now we will the df for y as shown below −

> df[df==" "]<-NA
> df
     x  y
1  <NA> 1
2   3   2
3   2 <NA>
4   1  43
5   2   2
6   3 <NA>
7   2   3
8   1   2
9 <NA>  3
10 43 <NA>
11 <NA> 7

Now, we have our complete data frame with NA’s and other numbers.

Updated on: 10-Aug-2020

830 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements