How to create a subset using character column with multiple matches in R?


Subsetting is one of the most important aspects of data analysis. One such situation could be subsetting the character column based on multiple values. For example, if a character column of an R data frame has 5 categories then we might want to extract only 2 or 3 or 4 values then it can be done by using the filter function of dplyr package with str_detect function of stringr package.

Consider the below data frame −

Example

 Live Demo

Group<-sample(LETTERS[1:6],25,replace=TRUE)
Response<-rnorm(25,3,0.24)
df1<-data.frame(Group,Response)
df1

Output

   Group Response
1  A    3.040870
2  F    2.921251
3  E    2.911820
4  E    3.188297
5  B    3.054424
6  D    2.691892
7  F    2.714302
8  F    3.154340
9  F    3.058324
10 C    2.814400
11 B    3.040255
12 D    3.270639
13 A    3.197537
14 E    2.646717
15 D    2.671441
16 C    3.233093
17 F    2.555055
18 E    2.670018
19 E    2.607526
20 F    2.952952
21 C    3.257484
22 B    3.009312
23 C    3.142553
24 B    3.355754
25 B    3.262376

Loading dplyr and stringr package and filtering the df1 based on A, C, and D values in Group −

Example

library(dplyr)
library(stringr)
df1%>%filter(str_detect(Group,"A|C|D"))

Output

  Group  Response
1   A   3.040870
2   D   2.691892
3   C   2.814400
4   D   3.270639
5   A   3.197537
6   D   2.671441
7   C   3.233093
8   C   3.257484
9   C   3.142553

Example

 Live Demo

Region<-sample(c("Asia","Oceania","Africa","America"),25,replace=TRUE)
Y<-rpois(25,5)
df2<-data.frame(Region,Y)
df2

Output

   Region   Y
1  Africa   5
2  Oceania  4
3  Oceania  3
4  Oceania  3
5  Oceania  6
6  Oceania  2
7  Oceania  4
8  Oceania  6
9  Asia     1
10 Africa   4
11 Asia     7
12 Asia     10
13 Oceania  1
14 America  5
15 Oceania  3
16 Africa   8
17 Oceania  9
18 Asia     11
19 Africa   7
20 Africa   3
21 Africa   2
22 Asia     5
23 America  6
24 America  2
25 America  1

Filtering the df2 based on Oceania, America, and Africa values in Region −

Example

df2%>%filter(str_detect(Region,"Oceania|America|Africa"))

Output

    Region   Y
1  Africa    5
2  Oceania   4
3  Oceania   3
4  Oceania   3
5  Oceania   6
6  Oceania   2
7  Oceania   4
8  Oceania   6
9  Africa    4
10 Oceania   1
11 America   5
12 Oceania   3
13 Africa    8
14 Oceania   9
15 Africa    7
16 Africa    3
17 Africa    2
18 America   6
19 America   2
20 America   1

Updated on: 11-Feb-2021

502 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements