

- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to subset columns that has less than four categories in an R data frame?
If column is categorical then there can be at least two categories and there is no limit for the total number of categories but it will also depend on the total number of cases. If we have a data frame that contain some categorical columns having more or less categories than 4 then we might want to subset columns having less than four categories. This could be required in situations when we want to subset the data biasedly or have some predefined data characteristics that allows this change. The subset of such columns can be done with the help of sapply function as shown in the below examples.
Example1
Consider the below data frame −
> x1<-sample(c("Hot","Cold","Warm"),20,replace=TRUE) > x2<-sample(c("Male","Female"),20,replace=TRUE) > x3<-sample(letters[1:4],20,replace=TRUE) > df1<-data.frame(x1,x2,x3) > df1
Output
x1 x2 x3 1 Warm Male b 2 Cold Female c 3 Cold Male a 4 Hot Male d 5 Hot Male d 6 Hot Female a 7 Hot Male a 8 Cold Female d 9 Warm Male d 10 Warm Female d 11 Cold Male a 12 Cold Female c 13 Hot Male b 14 Warm Male c 15 Cold Male b 16 Warm Male a 17 Hot Male b 18 Cold Male b 19 Hot Female c 20 Warm Female d
Finding the subset of columns that have less than 4 categories in df1 −
> df1[,sapply(df1, function(col) length(unique(col)))<4]
Output
x1 x2 1 Warm Male 2 Cold Female 3 Cold Male 4 Hot Male 5 Hot Male 6 Hot Female 7 Hot Male 8 Cold Female 9 Warm Male 10 Warm Female 11 Cold Male 12 Cold Female 13 Hot Male 14 Warm Male 15 Cold Male 16 Warm Male 17 Hot Male 18 Cold Male 19 Hot Female 20 Warm Female
Example2
> y1<-sample(c("Male","Female"),20,replace=TRUE) > y2<-sample(letters[1:5],20,replace=TRUE) > y3<-sample(c("Asian","American","Chinese"),20,replace=TRUE) > df2<-data.frame(y1,y2,y3) > df2
Output
y1 y2 y3 1 Male b Chinese 2 Female b American 3 Female d Asian 4 Female e American 5 Female e Asian 6 Female c Chinese 7 Female a Chinese 8 Female a Chinese 9 Male d American 10 Female d Chinese 11 Female d Chinese 12 Female c American 13 Female b American 14 Male d Chinese 15 Male a American 16 Male e Asian 17 Male b Asian 18 Female d Chinese 19 Female d Chinese 20 Female c Asian
Finding the subset of columns that have less than 4 categories in df2 −
> df2[,sapply(df2, function(col) length(unique(col)))<4]
Output
y1 y3 1 Male Chinese 2 Female American 3 Female Asian 4 Female American 5 Female Asian 6 Female Chinese 7 Female Chinese 8 Female Chinese 9 Male American 10 Female Chinese 11 Female Chinese 12 Female American 13 Female American 14 Male Chinese 15 Male American 16 Male Asian 17 Male Asian 18 Female Chinese 19 Female Chinese 20 Female Asian
- Related Questions & Answers
- How to subset factor columns in an R data frame?
- How to subset an R data frame by specifying columns that contains NA?
- How to remove rows using character column that has elements of size less than 3 in an R data frame?
- How to remove rows for categorical columns that has three or less combination of duplicates in an R data frame?
- How to find the counts of categories in categorical columns in an R data frame?
- How to subset rows of an R data frame if any columns have values greater than a certain value?
- How to subset rows of an R data frame if all columns have values greater than a certain value
- How to find the counts of categories in categorical columns in an R Programming data frame?
- How to create a subset of an R data frame based on multiple columns?
- How to subset nth row from an R data frame?
- How to select values less than or greater than a specific percentile from an R data frame column?
- How to standardize columns in an R data frame?
- How to subset row values based on columns name in R data frame?
- How to subset an R data frame by ignoring a value in one of the columns?
- How to subset rows that do not contain NA and blank in one of the columns in an R data frame?