How to collapse factor levels in an R data frame?

R ProgrammingServer Side ProgrammingProgramming

Sometimes the levels of a factor are not correctly recorded, for example, recording male with M in some places and with Mal in some places hence there are two levels for level male. Therefore, the number of levels increases if the factor levels are incorrectly recorded and we need to fix this issue because the analysis using these factor levels will be wrong. To convert the incorrect factor levels into the appropriate ones, we can use list function to define those levels.

Example 1

 Live Demo

F<-c("Male","Ma","Fem","Female","M","Male","Mal","Male","Fe","Female","M","Fema","Ma","Femal","F","Fem","Male","Ma","Male","Female")
Rate<-rep(c(25,30,37,56),times=5)
df1<-data.frame(F,Rate)
df1

Output

F Rate
1 Male 25
2 Ma 30
3 Fem 37
4 Female 56
5 M 25
6 Male 30
7 Mal 37
8 Male 56
9 Fe 25
10 Female 30
11 M 37
12 Fema 56
13 Ma 25
14 Femal 30
15 F 37
16 Fem 56
17 Male 25
18 Ma 30
19 Male 37
20 Female 56
levels(df1$F)<-list("Male"=c("Male","Ma","Mal","M"),"Female"=c("Female","Fe","Fem","Fema","Femal","F"))
df1
F Rate
1 Male 25
2 Male 30
3 Female 37
4 Female 56
5 Male 25
6 Male 30
7 Male 37
8 Male 56
9 Female 25
10 Female 30
11 Male 37
12 Female 56
13 Male 25
14 Female 30
15 Female 37
16 Female 56
17 Male 25
18 Male 30
19 Male 37
20 Female 56

Example 2

 Live Demo

MotorCycleTypes<-c("Cru","Sp","Sport","Tour","Endu","Cruiser","Touri","Enduro","Spo","Cruise","Touring","To","Sp","End","Cruis","Cruiser","Sport","End","Tour","Enduro")
Frequency<-sample(1:30,20,replace=TRUE)
df2<-data.frame(MotorCycleTypes,Frequency)
df2

Output

MotorCycleTypes Frequency
1 Cru 5
2 Sp 15
3 Sport 10
4 Tour 2
5 Endu 25
6 Cruiser 6
7 Touri 17
8 Enduro 5
9 Spo 15
10 Cruise 25
11 Touring 12
12 To 11
13 Sp 20
14 End 6
15 Cruis 1
16 Cruiser 12
17 Sport 21
18 End 5
19 Tour 23
20 Enduro 2
levels(df2$MotorCycleTypes)<-list("Cruise"=c("Cruiser","Cru","Cruis","Cruise"),"Sport"=c("Sport","Sp","Spo"),"Enduro"=c("Enduro","Endu","End"),"Touring"=c("Touring","Tour","To","Touri"))
df2
MotorCycleTypes Frequency
1 Cruise 5
2 Sport 15
3 Sport 10
4 Touring 2
5 Enduro 25
6 Cruise 6
7 Touring 17
8 Enduro 5
9 Sport 15
10 Cruise 25
11 Touring 12
12 Touring 11
13 Sport 20
14 Enduro 6
15 Cruise 1
16 Cruise 12
17 Sport 21
18 Enduro 5
19 Touring 23
20 Enduro 2
raja
Published on 21-Aug-2020 10:32:29
Advertisements