- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to create a subset using character column with multiple matches in R?
Subsetting is one of the most important aspects of data analysis. One such situation could be subsetting the character column based on multiple values. For example, if a character column of an R data frame has 5 categories then we might want to extract only 2 or 3 or 4 values then it can be done by using the filter function of dplyr package with str_detect function of stringr package.
Consider the below data frame −
Example
Group<-sample(LETTERS[1:6],25,replace=TRUE) Response<-rnorm(25,3,0.24) df1<-data.frame(Group,Response) df1
Output
Group Response 1 A 3.040870 2 F 2.921251 3 E 2.911820 4 E 3.188297 5 B 3.054424 6 D 2.691892 7 F 2.714302 8 F 3.154340 9 F 3.058324 10 C 2.814400 11 B 3.040255 12 D 3.270639 13 A 3.197537 14 E 2.646717 15 D 2.671441 16 C 3.233093 17 F 2.555055 18 E 2.670018 19 E 2.607526 20 F 2.952952 21 C 3.257484 22 B 3.009312 23 C 3.142553 24 B 3.355754 25 B 3.262376
Loading dplyr and stringr package and filtering the df1 based on A, C, and D values in Group −
Example
library(dplyr) library(stringr) df1%>%filter(str_detect(Group,"A|C|D"))
Output
Group Response 1 A 3.040870 2 D 2.691892 3 C 2.814400 4 D 3.270639 5 A 3.197537 6 D 2.671441 7 C 3.233093 8 C 3.257484 9 C 3.142553
Example
Region<-sample(c("Asia","Oceania","Africa","America"),25,replace=TRUE) Y<-rpois(25,5) df2<-data.frame(Region,Y) df2
Output
Region Y 1 Africa 5 2 Oceania 4 3 Oceania 3 4 Oceania 3 5 Oceania 6 6 Oceania 2 7 Oceania 4 8 Oceania 6 9 Asia 1 10 Africa 4 11 Asia 7 12 Asia 10 13 Oceania 1 14 America 5 15 Oceania 3 16 Africa 8 17 Oceania 9 18 Asia 11 19 Africa 7 20 Africa 3 21 Africa 2 22 Asia 5 23 America 6 24 America 2 25 America 1
Filtering the df2 based on Oceania, America, and Africa values in Region −
Example
df2%>%filter(str_detect(Region,"Oceania|America|Africa"))
Output
Region Y 1 Africa 5 2 Oceania 4 3 Oceania 3 4 Oceania 3 5 Oceania 6 6 Oceania 2 7 Oceania 4 8 Oceania 6 9 Africa 4 10 Oceania 1 11 America 5 12 Oceania 3 13 Africa 8 14 Oceania 9 15 Africa 7 16 Africa 3 17 Africa 2 18 America 6 19 America 2 20 America 1
Advertisements