- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to collapse factor levels in an R data frame?
Sometimes the levels of a factor are not correctly recorded, for example, recording male with M in some places and with Mal in some places hence there are two levels for level male. Therefore, the number of levels increases if the factor levels are incorrectly recorded and we need to fix this issue because the analysis using these factor levels will be wrong. To convert the incorrect factor levels into the appropriate ones, we can use list function to define those levels.
Example 1
F<-c("Male","Ma","Fem","Female","M","Male","Mal","Male","Fe","Female","M","Fema","Ma","Femal","F","Fem","Male","Ma","Male","Female") Rate<-rep(c(25,30,37,56),times=5) df1<-data.frame(F,Rate) df1
Output
F Rate 1 Male 25 2 Ma 30 3 Fem 37 4 Female 56 5 M 25 6 Male 30 7 Mal 37 8 Male 56 9 Fe 25 10 Female 30 11 M 37 12 Fema 56 13 Ma 25 14 Femal 30 15 F 37 16 Fem 56 17 Male 25 18 Ma 30 19 Male 37 20 Female 56 levels(df1$F)<-list("Male"=c("Male","Ma","Mal","M"),"Female"=c("Female","Fe","Fem","Fema","Femal","F")) df1 F Rate 1 Male 25 2 Male 30 3 Female 37 4 Female 56 5 Male 25 6 Male 30 7 Male 37 8 Male 56 9 Female 25 10 Female 30 11 Male 37 12 Female 56 13 Male 25 14 Female 30 15 Female 37 16 Female 56 17 Male 25 18 Male 30 19 Male 37 20 Female 56
Example 2
MotorCycleTypes<-c("Cru","Sp","Sport","Tour","Endu","Cruiser","Touri","Enduro","Spo","Cruise","Touring","To","Sp","End","Cruis","Cruiser","Sport","End","Tour","Enduro") Frequency<-sample(1:30,20,replace=TRUE) df2<-data.frame(MotorCycleTypes,Frequency) df2
Output
MotorCycleTypes Frequency 1 Cru 5 2 Sp 15 3 Sport 10 4 Tour 2 5 Endu 25 6 Cruiser 6 7 Touri 17 8 Enduro 5 9 Spo 15 10 Cruise 25 11 Touring 12 12 To 11 13 Sp 20 14 End 6 15 Cruis 1 16 Cruiser 12 17 Sport 21 18 End 5 19 Tour 23 20 Enduro 2 levels(df2$MotorCycleTypes)<-list("Cruise"=c("Cruiser","Cru","Cruis","Cruise"),"Sport"=c("Sport","Sp","Spo"),"Enduro"=c("Enduro","Endu","End"),"Touring"=c("Touring","Tour","To","Touri")) df2 MotorCycleTypes Frequency 1 Cruise 5 2 Sport 15 3 Sport 10 4 Touring 2 5 Enduro 25 6 Cruise 6 7 Touring 17 8 Enduro 5 9 Sport 15 10 Cruise 25 11 Touring 12 12 Touring 11 13 Sport 20 14 Enduro 6 15 Cruise 1 16 Cruise 12 17 Sport 21 18 Enduro 5 19 Touring 23 20 Enduro 2
- Related Articles
- How to extract the factor levels from factor column in an R data frame?
- How to create scatterplot for factor levels in an R data frame?
- How to make duplicate factor levels unique in an R data frame?
- How to combine the levels of a factor variable in an R data frame?
- How to find the cumulative sum for factor levels in an R data frame?
- How to drop factor levels in subset of a data frame in R?
- How to find the sum by distinct column for factor levels in an R data frame?
- How to convert numeric levels of a factor into string in R data frame?
- How to subset factor columns in an R data frame?
- How to create a new level using unique levels of a factor in R data frame?
- How to collapse data frame rows in R by summing using dplyr?
- How to sort a numerical factor column in an R data frame?
- How to extract only factor columns name from an R data frame?
- How to find the two factor interaction variables in an R data frame?
- How to create table of two factor columns in an R data frame?

Advertisements