- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to subset columns that has less than four categories in an R data frame?
If column is categorical then there can be at least two categories and there is no limit for the total number of categories but it will also depend on the total number of cases. If we have a data frame that contain some categorical columns having more or less categories than 4 then we might want to subset columns having less than four categories. This could be required in situations when we want to subset the data biasedly or have some predefined data characteristics that allows this change. The subset of such columns can be done with the help of sapply function as shown in the below examples.
Example1
Consider the below data frame −
> x1<-sample(c("Hot","Cold","Warm"),20,replace=TRUE) > x2<-sample(c("Male","Female"),20,replace=TRUE) > x3<-sample(letters[1:4],20,replace=TRUE) > df1<-data.frame(x1,x2,x3) > df1
Output
x1 x2 x3 1 Warm Male b 2 Cold Female c 3 Cold Male a 4 Hot Male d 5 Hot Male d 6 Hot Female a 7 Hot Male a 8 Cold Female d 9 Warm Male d 10 Warm Female d 11 Cold Male a 12 Cold Female c 13 Hot Male b 14 Warm Male c 15 Cold Male b 16 Warm Male a 17 Hot Male b 18 Cold Male b 19 Hot Female c 20 Warm Female d
Finding the subset of columns that have less than 4 categories in df1 −
> df1[,sapply(df1, function(col) length(unique(col)))<4]
Output
x1 x2 1 Warm Male 2 Cold Female 3 Cold Male 4 Hot Male 5 Hot Male 6 Hot Female 7 Hot Male 8 Cold Female 9 Warm Male 10 Warm Female 11 Cold Male 12 Cold Female 13 Hot Male 14 Warm Male 15 Cold Male 16 Warm Male 17 Hot Male 18 Cold Male 19 Hot Female 20 Warm Female
Example2
> y1<-sample(c("Male","Female"),20,replace=TRUE) > y2<-sample(letters[1:5],20,replace=TRUE) > y3<-sample(c("Asian","American","Chinese"),20,replace=TRUE) > df2<-data.frame(y1,y2,y3) > df2
Output
y1 y2 y3 1 Male b Chinese 2 Female b American 3 Female d Asian 4 Female e American 5 Female e Asian 6 Female c Chinese 7 Female a Chinese 8 Female a Chinese 9 Male d American 10 Female d Chinese 11 Female d Chinese 12 Female c American 13 Female b American 14 Male d Chinese 15 Male a American 16 Male e Asian 17 Male b Asian 18 Female d Chinese 19 Female d Chinese 20 Female c Asian
Finding the subset of columns that have less than 4 categories in df2 −
> df2[,sapply(df2, function(col) length(unique(col)))<4]
Output
y1 y3 1 Male Chinese 2 Female American 3 Female Asian 4 Female American 5 Female Asian 6 Female Chinese 7 Female Chinese 8 Female Chinese 9 Male American 10 Female Chinese 11 Female Chinese 12 Female American 13 Female American 14 Male Chinese 15 Male American 16 Male Asian 17 Male Asian 18 Female Chinese 19 Female Chinese 20 Female Asian
- Related Articles
- How to subset factor columns in an R data frame?
- How to subset an R data frame by specifying columns that contains NA?
- How to remove rows for categorical columns that has three or less combination of duplicates in an R data frame?
- How to remove rows using character column that has elements of size less than 3 in an R data frame?
- How to find the counts of categories in categorical columns in an R data frame?
- How to create a subset of an R data frame based on multiple columns?
- How to subset rows of an R data frame if any columns have values greater than a certain value?
- How to subset rows of an R data frame if all columns have values greater than a certain value
- How to subset row values based on columns name in R data frame?
- How to subset an R data frame by ignoring a value in one of the columns?
- How to subset nth row from an R data frame?
- How to standardize columns in an R data frame?
- How to subset rows that do not contain NA and blank in one of the columns in an R data frame?
- How to subset an R data frame based on string match in two columns with OR condition?
- How to subset an R data frame based on small letters?
