- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to recode factors in R?
Sometimes we have factor levels that can be combined or we want to group those levels in a single level. It is mostly done in situations where we have only one value for a particular factor level or there exists some theoretical concept that leads to combining the factor levels. For example, if we have a data frame called df that contains a factor column say x having four categories A, B, C, and D then they can be grouped into A and B as −
df$x[df$x %in% c("A","B")]<-"A" df$x[df$x %in% c("C","D")]<-"B"
Example
Consider the below data frame −
factor<-sample(LETTERS[1:4],20,replace=TRUE) response<-rpois(20,5) df1<-data.frame(factor,response) df1
Output
factor response 1 A 5 2 C 7 3 D 5 4 C 13 5 C 5 6 C 4 7 B 4 8 B 10 9 C 4 10 D 6 11 B 5 12 B 3 13 A 7 14 A 2 15 A 2 16 D 3 17 B 1 18 C 5 19 D 6 20 D 4
Recoding factor levels in factor column of df1 −
df1$factor[df1$factor %in% c("A","B")]<-"A" df1$factor[df1$factor %in% c("C","D")]<-"B" df1
Output
factor response 1 A 5 2 B 7 3 B 5 4 B 13 5 B 5 6 B 4 7 A 4 8 A 10 9 B 4 10 B 6 11 A 5 12 A 3 13 A 7 14 A 2 15 A 2 16 B 3 17 A 1 18 B 5 19 B 6 20 B 4
Example2
grp<-sample(c("G1","G2","G3"),20,replace=TRUE) Y<-rnorm(20) df2<-data.frame(grp,Y) df2
Output
grp Y 1 G3 -0.39900138 2 G3 1.04085657 3 G1 1.46432790 4 G3 -0.90843955 5 G1 -0.15202516 6 G2 1.15456629 7 G2 1.24002828 8 G2 -1.10731484 9 G2 0.27423208 10 G3 1.06444903 11 G2 -0.21824650 12 G1 0.25843090 13 G1 0.07686889 14 G3 -0.21955611 15 G3 -0.05359245 16 G2 0.54630987 17 G3 -0.09808820 18 G1 -0.65171471 19 G2 -0.62371231 20 G2 -0.03319190
Recoding factor levels in grp column of df2 −
df2$grp[df2$grp %in% c("G1","G2")]<-"Control" df2
grp Y 1 G3 -0.39900138 2 G3 1.04085657 3 Control 1.46432790 4 G3 -0.90843955 5 Control -0.15202516 6 Control 1.15456629 7 Control 1.24002828 8 Control -1.10731484 9 Control 0.27423208 10 G3 1.06444903 11 Control -0.21824650 12 Control 0.25843090 13 Control 0.07686889 14 G3 -0.21955611 15 G3 -0.05359245 16 Control 0.54630987 17 G3 -0.09808820 18 Control -0.65171471 19 Control -0.62371231 20 Control -0.03319190
Advertisements