How to replace missing values recorded with blank spaces in R with NA or any other value?

R ProgrammingServer Side ProgrammingProgramming

Sometimes when we read data in R, the missing values are recorded as blank spaces and it is difficult to replace them with any value. The reason behind this is we need to know how many spaces we have used in place of missing values. If we know that then assigning any value becomes easy.

Example

Consider the below data frame of vectors x and y.

> x<-c("", 3,2,1,2,3,2,1," ", 43, "")
> y<-c(1,2,"", 43,2," ", 3,2,3,"", 7)
> df<-data.frame(x,y)
> df
  x  y
1    1
2 3  2
3 2
4 1 43
5 2  2
6    3
7 2  3
8 1  2
9    3
10  43
11   7

Here, we have missing values recorded as blank spaces as well simply with double inverted commas. Now let’s replace these missing values with NA as shown below −

> df[df==""]<-NA
> df
    x   y
1 <NA>  1
2   3   2
3   2 <NA>
4   1  43
5   2   2
6       3
7   2   3
8   1   2
9       3
10 43 <NA>
11 <NA> 7

Here, the nineth value in x and sixth value in y are not replaced because the number of blank spaces, so we need to specify them. First, read the number of spaces by looking at the vectors as follows −

> x
[1] "" "3" "2" "1" "2" "3" "2" "1" " " "43" ""
> y
[1] "1"  "2"    ""    "43" "2"    " "    "3"    "2"
[9] "3"   ""    "7"

There seems to be one blank space for nineth value in x and five blank spaces in sixth value of y. Now let’s change the df for x as follows −

> df[df==" "]<-NA
> df
     x  y
1  <NA> 1
2   3   2
3   2 <NA>
4   1  43
5   2   2
6       3
7   2   3
8   1   2
9  <NA> 3
10 43 <NA>
11 <NA> 7

Now we will the df for y as shown below −

> df[df==" "]<-NA
> df
     x  y
1  <NA> 1
2   3   2
3   2 <NA>
4   1  43
5   2   2
6   3 <NA>
7   2   3
8   1   2
9 <NA>  3
10 43 <NA>
11 <NA> 7

Now, we have our complete data frame with NA’s and other numbers.

raja
Published on 10-Aug-2020 12:26:31
Advertisements