How to select columns in R based on the string that matches with the column name using dplyr?


Selection of columns in R is generally done with the column number or its name with $ delta operator. We can also select the columns with their partial name string or complete name as well without using $ delta operator. This can be done with select and matches function of dplyr package.

Example

Loading dplyr package −

> library(dplyr)

Consider the BOD data in base R −

> str(BOD)
'data.frame': 6 obs. of 2 variables:
$ Time : num 1 2 3 4 5 7
$ demand: num 8.3 10.3 19 16 15.6 19.8
- attr(*, "reference")= chr "A1.4, p. 270"

Selecting the column of BOD data that has string “and” −

> select(BOD,matches("and"))
demand
1 8.3
2 10.3
3 19.0
4 16.0
5 15.6
6 19.8

Consider the trees data in base R −

> str(trees)
'data.frame': 31 obs. of 3 variables:
$ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
$ Height: num 70 65 63 72 81 83 66 75 80 75 ...
$ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...

Selecting column that has string “Vol” −

> select(trees,matches("Vol"))
Volume
1 10.3
2 10.3
3 10.2
4 16.4
5 18.8
6 19.7
7 15.6
8 18.2
9 22.6
10 19.9
11 24.2
12 21.0
13 21.4
14 21.3
15 19.1
16 22.2
17 33.8
18 27.4
19 25.7
20 24.9
21 34.5
22 31.7
23 36.3
24 38.3
25 42.6
26 55.4
27 55.7
28 58.3
29 51.5
30 51.0
31 77.0

Consider the women data in base R −

> str(women)
'data.frame': 15 obs. of 2 variables:
$ height: num 58 59 60 61 62 63 64 65 66 67 ...
$ weight: num 115 117 120 123 126 129 132 135 139 142 ...

Selecting column that has string “string” −

> select(women,matches("height"))
height
1 58
2 59
3 60
4 61
5 62
6 63
7 64
8 65
9 66
10 67
11 68
12 69
13 70
14 71
15 72

Consider the mtcars data in base R −

> str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...

Selecting column that has string “mpg” −

> select(mtcars,matches("mpg"))
mpg
Mazda RX4 21.0
Mazda RX4 Wag 21.0
Datsun 710 22.8
Hornet 4 Drive 21.4
Hornet Sportabout 18.7
Valiant 18.1
Duster 360 14.3
Merc 240D 24.4
Merc 230 22.8
Merc 280 19.2
Merc 280C 17.8
Merc 450SE 16.4
Merc 450SL 17.3
Merc 450SLC 15.2
Cadillac Fleetwood 10.4
Lincoln Continental 10.4
Chrysler Imperial 14.7
Fiat 128 32.4
Honda Civic 30.4
Toyota Corolla 33.9
Toyota Corona 21.5
Dodge Challenger 15.5
AMC Javelin 15.2
Camaro Z28 13.3
Pontiac Firebird 19.2
Fiat X1-9 27.3
Porsche 914-2 26.0
Lotus Europa 30.4
Ford Pantera L 15.8
Ferrari Dino 19.7
Maserati Bora 15.0
Volvo 142E 21.4

Consider the sleep data in base R −

> str(sleep)
'data.frame': 20 obs. of 3 variables:
$ extra: num 0.7 -1.6 -0.2 -1.2 -0.1 3.4 3.7 0.8 0 2 ...
$ group: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ ID : Factor w/ 10 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...

Selecting column that has string “ext” −

> select(sleep,matches("ext"))
extra
1 0.7
2 -1.6
3 -0.2
4 -1.2
5 -0.1
6 3.4
7 3.7
8 0.8
9 0.0
10 2.0
11 1.9
12 0.8
13 1.1
14 0.1
15 -0.1
16 4.4
17 5.5
18 1.6
19 4.6
20 3.4

Updated on: 11-Aug-2020

702 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements