How to subset factor columns in an R data frame?


Subsetting of factor columns can be done by creating an object of all columns using sapply with is.factor to extract only factor column in the future then passing that object into subsetting operator single square brackets. For example, if we have a data frame df that contains three columns x, y, z and two of them say x and y are factor columns then we can use Factors<-sapply(df,is.factor) and then use df[,Factors], this will subset only factor columns in the data frame df.

Example

Consider the below data frame −

 Live Demo

x1<-as.factor(sample(LETTERS[1:3],20,replace=TRUE))
x2<-as.factor(sample(c("GRP1","GRP2","GRP3","GRP4","GRP5"),20,replace=TRUE))
x3<-sample(1:10,20,replace=TRUE)
df1<-data.frame(x1,x2,x3)
df1

Output

 x1  x2  x3
1 A GRP1 2
2 B GRP1 7
3 B GRP3 1
4 A GRP4 8
5 B GRP2 8
6 A GRP3 6
7 C GRP1 8
8 B GRP3 9
9 B GRP5 1
10 C GRP3 8
11 A GRP3 1
12 C GRP1 1
13 B GRP1 10
14 C GRP1 7
15 C GRP3 10
16 C GRP2 4
17 C GRP2 1
18 B GRP1 2
19 C GRP3 10
20 A GRP2 3

Creating an object of columns using sapply to extract the factor columns using single square brackets −

Example

Factors<-sapply(df1,is.factor)
Factors

Output

 x1    x2  x3
TRUE TRUE FALSE

Extracting factor columns −

Example

Factors_df1<-df1[,Factors]
Factors_df1

Output

  x1 x2
1 A GRP1
2 B GRP1
3 B GRP3
4 A GRP4
5 B GRP2
6 A GRP3
7 C GRP1
8 B GRP3
9 B GRP5
10 C GRP3
11 A GRP3
12 C GRP1
13 B GRP1
14 C GRP1
15 C GRP3
16 C GRP2
17 C GRP2
18 B GRP1
19 C GRP3
20 A GRP2

Let’s have a look at another example −

Example

 Live Demo

Salary_Grp<-as.factor(sample(c("20-30","31-40","41-50"),20,replace=TRUE))
Gender<-as.factor(sample(c("Male","Female"),20,replace=TRUE))
Rating<-sample(0:10,20,replace=TRUE)
df2<-data.frame(Salary_Grp,Gender,Rating)
df2

Output

Salary_Grp Gender Rating
1 20-30    Male    7
2 20-30    Female  8
3 31-40    Male    5
4 41-50    Male    7
5 41-50    Male    6
6 20-30    Male    7
7 20-30    Female  0
8 20-30    Male    5
9 31-40    Female  2
10 20-30   Male    7
11 31-40   Male    8
12 31-40   Female  4
13 20-30   Male    9
14 20-30   Female  5
15 31-40   Male    0
16 20-30   Female  9
17 41-50   Female 10
18 31-40   Female  1
19 31-40   Male    5
20 20-30   Female  3

Example

Factors_df2<-sapply(df2,is.factor)
Factors_df2

Output

Salary_Grp Gender Rating
TRUE TRUE FALSE

Example

Factors_df2<-df2[,Factors_df2]
Factors_df2

Output

 Salary_Grp Gender
1 20-30    Male
2 20-30    Female
3 31-40    Male
4 41-50    Male
5 41-50    Male
6 20-30    Male
7 20-30    Female
8 20-30    Male
9 31-40    Female
10 20-30   Male
11 31-40   Male
12 31-40   Female
13 20-30   Male
14 20-30   Female
15 31-40   Male
16 20-30   Female
17 41-50   Female
18 31-40   Female
19 31-40   Male
20 20-30   Female

Updated on: 17-Oct-2020

579 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements