In R programming, mostly the columns with string values can be either represented by character data type or factor data type. For example, if we have a column Group with four unique values as A, B, C, and D then it can be of character or factor with four levels. If we want to take the subset of these columns then subset function can be used. Check out the example below.
Consider the below data frame −
set.seed(888) Grp<-sample(c("A","B","C"),20,replace=TRUE)Age<-sample(21:50,20) df1<-data.frame(Grp,Age) df1
Grp Age 1 A 35 2 C 40 3 C 48 4 C 46 5 C 36 6 C 33 7 B 47 8 A 45 9 B 43 10 B 37 11 B 30 12 A 24 13 C 39 14 C 50 15 C 25 16 A 34 17 B 49 18 A 44 19 C 38 20 B 26
str(df1) 'data.frame': 20 obs. of 2 variables:
$ Grp: chr "A" "C" "C" "C" ... $ Age: int 35 40 48 46 36 33 47 45 43 37 ...
Taking subset of df1 based on Grp column values A and C −
subset(df1, Grp %in% c("A","C"))
Grp Age 1 A 35 2 C 40 3 C 48 4 C 46 5 C 36 6 C 33 8 A 45 12 A 24 13 C 39 14 C 50 15 C 25 16 A 34 18 A 44 19 C 38
Let’s have a look at another example −
Class<-sample(c("First","Second","Third","Fourth"),20,replace=TRUE) Score<-sample(1:10,20,replace=TRUE) df2<-data.frame(Class,Score) df2
Class Score 1 First 10 2 First 3 3 First 1 4 First 7 5 First 1 6 Third 4 7 First 3 8 First 3 9 Second 2 10 First 8 11 Fourth 1 12 Third 6 13 First 6 14 Second 1 15 First 8 16 Fourth 4 17 Third 7 18 Fourth 4 19 Third 7 20 Fourth 1
str(df2) 'data.frame': 20 obs. of 2 variables:
$ Class: chr "First" "Third" "Second" "First" ... $ Score: int 1 4 9 8 9 10 2 8 5 8 ...
Taking subset of df2 based on Class column values First and Fourth −
subset(df2, Class %in% c("First","Fourth"))
Class Score 1 First 1 4 First 8 5 First 9 6 Fourth 10 7 Fourth 2 9 Fourth 5 10 Fourth 8 11 Fourth 8 13 Fourth 7 14 Fourth 10 15 First 7 16 Fourth 10 17 Fourth 4 19 First 2 20 First 10