- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to subset columns that has less than four categories in an R data frame?
If column is categorical then there can be at least two categories and there is no limit for the total number of categories but it will also depend on the total number of cases. If we have a data frame that contain some categorical columns having more or less categories than 4 then we might want to subset columns having less than four categories. This could be required in situations when we want to subset the data biasedly or have some predefined data characteristics that allows this change. The subset of such columns can be done with the help of sapply function as shown in the below examples.
Example1
Consider the below data frame −
> x1<-sample(c("Hot","Cold","Warm"),20,replace=TRUE) > x2<-sample(c("Male","Female"),20,replace=TRUE) > x3<-sample(letters[1:4],20,replace=TRUE) > df1<-data.frame(x1,x2,x3) > df1
Output
x1 x2 x3 1 Warm Male b 2 Cold Female c 3 Cold Male a 4 Hot Male d 5 Hot Male d 6 Hot Female a 7 Hot Male a 8 Cold Female d 9 Warm Male d 10 Warm Female d 11 Cold Male a 12 Cold Female c 13 Hot Male b 14 Warm Male c 15 Cold Male b 16 Warm Male a 17 Hot Male b 18 Cold Male b 19 Hot Female c 20 Warm Female d
Finding the subset of columns that have less than 4 categories in df1 −
> df1[,sapply(df1, function(col) length(unique(col)))<4]
Output
x1 x2 1 Warm Male 2 Cold Female 3 Cold Male 4 Hot Male 5 Hot Male 6 Hot Female 7 Hot Male 8 Cold Female 9 Warm Male 10 Warm Female 11 Cold Male 12 Cold Female 13 Hot Male 14 Warm Male 15 Cold Male 16 Warm Male 17 Hot Male 18 Cold Male 19 Hot Female 20 Warm Female
Example2
> y1<-sample(c("Male","Female"),20,replace=TRUE) > y2<-sample(letters[1:5],20,replace=TRUE) > y3<-sample(c("Asian","American","Chinese"),20,replace=TRUE) > df2<-data.frame(y1,y2,y3) > df2
Output
y1 y2 y3 1 Male b Chinese 2 Female b American 3 Female d Asian 4 Female e American 5 Female e Asian 6 Female c Chinese 7 Female a Chinese 8 Female a Chinese 9 Male d American 10 Female d Chinese 11 Female d Chinese 12 Female c American 13 Female b American 14 Male d Chinese 15 Male a American 16 Male e Asian 17 Male b Asian 18 Female d Chinese 19 Female d Chinese 20 Female c Asian
Finding the subset of columns that have less than 4 categories in df2 −
> df2[,sapply(df2, function(col) length(unique(col)))<4]
Output
y1 y3 1 Male Chinese 2 Female American 3 Female Asian 4 Female American 5 Female Asian 6 Female Chinese 7 Female Chinese 8 Female Chinese 9 Male American 10 Female Chinese 11 Female Chinese 12 Female American 13 Female American 14 Male Chinese 15 Male American 16 Male Asian 17 Male Asian 18 Female Chinese 19 Female Chinese 20 Female Asian
To Continue Learning Please Login
Login with Google