How to extract columns based on particular column values of an R data frame that match
a pattern?


The column values of an R data frame can be easily extracted by subsetting with single square brackets but if we want to extract the column values that match a pattern then we need to use grepl function inside single square brackets, this will help us to match the pattern of the values in the data frame columns.

Example

Consider the below data frame:

Live Demo

> set.seed(271)
> x1<-sample(c("A187","A321","A123","A242"),20,replace=TRUE)
> x2<-sample(c("B87","B74","B81","B71"),20,replace=TRUE)
> df1<-data.frame(x1,x2)
> df1

Output

    x1 x2
1  A242 B71
2  A123 B71
3  A242 B81
4  A242 B87
5  A123 B71
6  A321 B71
7  A187 B74
8  A321 B74
9  A123 B71
10 A242 B74
11 A321 B74
12 A187 B87
13 A321 B74
14 A321 B74
15 A123 B74
16 A187 B87
17 A123 B81
18 A242 B87
19 A242 B71
20 A187 B74

Extracting values that starts with A1:

Example

> df1[grepl("A1",x1),]

Output

x1 x2
2 A123 B71
5 A123 B71
7 A187 B74
9 A123 B71
12 A187 B87
15 A123 B74
16 A187 B87
17 A123 B81
20 A187 B74

Extracting values that starts with A2:

Example

> df1[grepl("A2",x1),]

Output

    x1  x2
1  A242 B71
3  A242 B81
4  A242 B87
10 A242 B74
18 A242 B87
19 A242 B71

Extracting values that starts with A3:

Example

> df1[grepl("A3",x1),]

Output

    x1  x2
6  A321 B71
8  A321 B74
11 A321 B74
13 A321 B74
14 A321 B74

Extracting values that starts with B7:

Example

> df1[grepl("B7",x2),]

Output

    x1   x2
1  A242 B71
2  A123 B71
5  A123 B71
6  A321 B71
7  A187 B74
8  A321 B74
9  A123 B71
10 A242 B74
11 A321 B74
13 A321 B74
14 A321 B74
15 A123 B74
19 A242 B71
20 A187 B74

Extracting values that starts with B8:

Example

> df1[grepl("B8",x2),]

Output

    x1 x2
3  A242 B81
4  A242 B87
12 A187 B87
16 A187 B87
17 A123 B81
18 A242 B87

Let’s have a look at another example:

Example

Live Demo

> y1<-sample(c("ID1","ID2","ID3","ID4"),20,replace=TRUE)
> y2<-sample(c("C1","C2"),20,replace=TRUE)
> df2<-data.frame(y1,y2)
> df2

Output

   y1 y2
1  ID1 C1
2  ID3 C1
3  ID1 C1
4  ID2 C2
5  ID4 C2
6  ID2 C2
7  ID2 C1
8  ID3 C1
9  ID1 C1
10 ID1 C1
11 ID2 C2
12 ID2 C2
13 ID4 C1
14 ID3 C2
15 ID2 C1
16 ID4 C1
17 ID4 C1
18 ID2 C2
19 ID2 C2
20 ID3 C2

Example

> df2[grepl("ID1",y1),]

Output

    y1 y2
1  ID1 C1
3  ID1 C1
9  ID1 C1
10 ID1 C1

Example

> df2[grepl("ID2",y1),]

Output

   y1 y2
4  ID2 C2
6  ID2 C2
7  ID2 C1
11 ID2 C2
12 ID2 C2
15 ID2 C1
18 ID2 C2
19 ID2 C2

Example

> df2[grepl("ID3",y1),]

Output

   y1  y2
2  ID3 C1
8  ID3 C1
14 ID3 C2
20 ID3 C2

Example

> df2[grepl("ID4",y1),]

Output

    y1 y2
5  ID4 C2
13 ID4 C1
16 ID4 C1
17 ID4 C1

Example

> df2[grepl("C1",y2),]

Output

    y1 y2
1  ID1 C1
2  ID3 C1
3  ID1 C1
7  ID2 C1
8  ID3 C1
9  ID1 C1
10 ID1 C1
13 ID4 C1
15 ID2 C1
16 ID4 C1
17 ID4 C1

Example

> df2[grepl("C2",y2),]

Output

    y1 y2
4  ID2 C2
5  ID4 C2
6  ID2 C2
11 ID2 C2
12 ID2 C2
14 ID3 C2
18 ID2 C2
19 ID2 C2
20 ID3 C2

Updated on: 21-Nov-2020

330 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements