How to subset a data frame by excluding the column names that are stored in a vector in R?


Subsetting of a data frame can be done in many ways and one such say is selecting the columns that are stored in a vector. Suppose we have a data frame df that has columns x, y, and z and the column names y and z are stored in a vector called V then we can subset df by excluding column names in V as select(df,-all_of(V)).

Example

Consider the below data frame:

Live Demo

> x1<-rpois(20,5)
> x2<-rpois(20,2)
> x3<-rpois(20,3)
> x4<-rpois(20,5)
> df1<-data.frame(x1,x2,x3,x4)
> df1

Output

x1 x2 x3 x4
1 3 4 0 5
2 4 1 2 6
3 4 1 2 3
4 8 1 7 6
5 4 2 3 8
6 4 4 1 0
7 4 1 1 2
8 7 2 4 4
9 4 3 6 5
10 4 3 5 7
11 3 2 3 5
12 4 2 3 5
13 3 1 2 5
14 4 2 5 7
15 4 3 7 2
16 2 1 3 6
17 5 1 8 3
18 4 0 4 6
19 5 2 4 9
20 9 0 4 7

Vector containing columns x1 and x4:

Example

> v1<-c("x1","x4")

Subsetting df1 by excluding x1 and x4:

> select(df1,-all_of(v1))

Output

 x2 x3
1 4 0
2 1 2
3 1 2
4 1 7
5 2 3
6 4 1
7 1 1
8 2 4
9 3 6
10 3 5
11 2 3
12 2 3
13 1 2
14 2 5
15 3 7
16 1 3
17 1 8
18 0 4
19 2 4
20 0 4

Let’s have a look at another example:

Example

Live Demo

> y1<-rnorm(20,1,0.098)
> y2<-rnorm(20,100,10)
> y3<-rnorm(20,5,0.97)
> y4<-rnorm(20,5275,30.5)
> df2<-data.frame(y1,y2,y3,y4)
> df2

Output

     y1        y2       y3       y4
1 1.0004066 95.44217 4.436526 5302.802
2 0.8704272 103.72030 4.459705 5279.560
3 1.0010894 96.78478 4.979246 5250.222
4 1.0856458 100.94359 5.480827 5261.604
5 0.9609981 98.62898 4.427267 5230.762
6 0.9497958 90.31327 4.332123 5204.725
7 0.9598390 95.87049 4.557982 5273.675
8 0.7686893 95.67384 5.747136 5232.587
9 0.8447364 97.65526 5.012912 5282.668
10 1.1740212 105.39359 4.088489 5300.367
11 0.9476001 115.77728 5.490385 5315.523
12 0.9824041 89.73841 4.703173 5256.286
13 0.9139366 112.73522 5.676117 5279.863
14 1.0712399 83.89056 4.510641 5275.326
15 1.1097967 91.60747 4.391030 5269.570
16 1.0449168 90.27042 3.793536 5210.164
17 0.8880382 74.78750 5.876453 5284.542
18 0.9304634 112.05254 5.410632 5330.084
19 1.1660059 108.03871 5.982188 5303.685
20 0.7662319 104.80364 5.518754 5283.069

Example

> v2<-c("y2","y3")

Subsetting df2 by excluding y2 and y3:

Example

> select(df2,-all_of(v2))

Output

      y1       y4
1 1.0004066 5302.802
2 0.8704272 5279.560
3 1.0010894 5250.222
4 1.0856458 5261.604
5 0.9609981 5230.762
6 0.9497958 5204.725
7 0.9598390 5273.675
8 0.7686893 5232.587
9 0.8447364 5282.668
10 1.1740212 5300.367
11 0.9476001 5315.523
12 0.9824041 5256.286
13 0.9139366 5279.863
14 1.0712399 5275.326
15 1.1097967 5269.570
16 1.0449168 5210.164
17 0.8880382 5284.542
18 0.9304634 5330.084
19 1.1660059 5303.685
20 0.7662319 5283.069

Updated on: 23-Nov-2020

476 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements