How to subset an R data frame based on string match in two columns with OR condition?


To subset an R data frame based on string match in two columns with OR condition, we can use grepl function with double square brackets and OR operator |. For example, if we have a data frame called df that contains two string columns say x and y then subsetting based on a particular string match in any of the columns can be done by using the below

Syntax

df[grepl("text",df[["x"]])|grepl("text",df[["y"]]),]

Check out the below examples to understand how it works.

Example1

Consider the below data frame −

 Live Demo

f1<-sample(c("India","China","Egypt","UK"),20,replace=TRUE)
f2<-sample(c("India","China","Egypt","UK"),20,replace=TRUE)
v1<-rnorm(20)
df1<-data.frame(f1,f2,v1)
df1

Output

      f1       f2         v1
1    India    India     0.58383357
2    UK       Egypt    -0.71045054
3    India    China    -0.07848666
4    Egypt    India     1.21017481
5    Egypt    UK       -0.81991817
6    Egypt    China     1.98979283
7    India    India     0.36160374
8    Egypt    China    -1.77619986
9    China    UK       -0.05397712
10   India    Egypt    -0.30372078
11   Egypt    India    -1.68623489
12   India    India    -0.41997104
13   India    China    -0.97064798
14   UK       Egypt     2.02704796
15   UK       Egypt    -0.47732133
16   China    China     0.53153059
17   Egypt    UK       -1.71608164
18   Egypt    India    -0.73298689
19   UK       UK        1.83674440
20   China    China    -1.12186527

Subsetting df1 based on matching of India in any of the first two columns −

df1<-df1[grepl("India",df1[["f1"]])|grepl("India",df1[["f2"]]),]
df1
      f1      f2         v1
1   India   India     0.58383357
3   India   China    -0.07848666
4   Egypt   India     1.21017481
7   India   India     0.36160374
10  India   Egypt    -0.30372078
11  Egypt   India    -1.68623489
12  India   India    -0.41997104
13  India   China    -0.97064798
18  Egypt   India    -0.73298689

Example2

 Live Demo

g1<-sample(c("Male","Female"),20,replace=TRUE)
g2<-sample(c("Male","Female"),20,replace=TRUE)
v2<-rpois(20,5)
df2<-data.frame(g1,g2)
df2

Output

    g1      g2
1  Female  Male
2  Female  Male
3  Female  Female
4  Male    Male
5  Male    Female
6  Female  Female
7  Female  Male
8  Male    Male
9  Male    Female
10 Male    Female
11 Female  Female
12 Male    Male
13 Male    Male
14 Male    Female
15 Female  Male
16 Female  Male
17 Female  Male
18 Male    Female
19 Female  Female
20 Male    Female

Subsetting df2 based on matching of Female in any of the first two columns −

df2<-df2[grepl("Female",df2[["g2"]])|grepl("Female",df2[["g2"]]),]
df2
     g1      g2
3   Female  Female
5   Male    Female
6   Female  Female
9   Male    Female
10  Male    Female
11  Female  Female
14  Male    Female
18  Male    Female
19  Female  Female
20  Male    Female

Updated on: 06-Mar-2021

762 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements