How to set a level of a factor column in an R data frame to NA?

R ProgrammingServer Side ProgrammingProgramming

In data analysis, we often face inappropriate data and hence the data analysis becomes difficult. An example of inappropriate data is reading missing values with a different value by naming them as Missing or Not Available. It can be done by using below syntax −

Syntax

levels(“data_frame_name”$”Column_name”)[levels(“data_frame_name”$”Column_name”=="Missing"]<-NA

Consider the below data frame −

Example

 Live Demo

Class<-as.factor(sample(c("First","Second","Missing"),20,replace=TRUE))
Score<-sample(1:10,20,replace=TRUE)
df1<-data.frame(Class,Score)
df1

Output

Class Score
1 Missing  2
2 First    2
3 Second   5
4 First    2
5 Missing  9
6 Second   3
7 Missing  7
8 Missing  3
9 First    3
10 Second  5
11 First   5
12 Second  5
13 Missing 1
14 First   2
15 Second  2
16 Second  3
17 Second  3
18 Second  9
19 Second 10
20 Missing 1

Changing Missing values of Class in data frame df1 to NA −

Example

levels(df1$Class)[levels(df1$Class)=="Missing"]<-NA
df1

Output

 Class Score
1 <NA>    4
2 Second  4
3 <NA>    4
4 <NA>    8
5 First   3
6 Second  1
7 <NA>    5
8 First  10
9 Second  8
10 <NA>   5
11 First  4
12 Second 5
13 First  2
14 <NA>   4
15 <NA>   3
16 First  4
17 <NA>   9
18 First  4
19 First  8
20 Second 7

Let’s have a look at another example −

Example

 Live Demo

Grp<-as.factor(sample(c("A","B","Not Available"),20,replace=TRUE))
Age<-sample(21:50,20)
df2<-data.frame(Grp,Age)
df2

Output

  Grp            Age
1  A             37
2  B             49
3  Not Available 31
4  B             34
5  B             46
6  Not Available 26
7  A             27
8  A             25
9  A             32
10 B             28
11 A              47
12 A              30
13 Not Available 38
14 B             39
15 A             33
16 Not Available 42
17 B             35
18 Not Available 21
19 Not Available 36
20 A            24

Changing Not Available values of Grp in data frame df2 to NA −

Example

levels(df2$Grp)[levels(df2$Grp)=="Not Available"]<NA
df2

Output

Grp Age
1 <NA> 32
2 <NA> 35
3 A 23
4 B 26
5 A 44
6 B 47
7 A 37
8 A 36
9 A 41
10 B 25
11 <NA> 45
12 A 42
13 <NA> 28
14 A 21
15 <NA> 48
16 A 38
17 <NA> 50
18 A 33
19 A 27
20 <NA> 22
raja
Published on 09-Oct-2020 15:26:11
Advertisements