Select columns of an R data frame and skip if does not exist.


Sometimes we have a large number of columns in the data frame and we know the name of some columns but among known ones some does not exist in the data frame. Now if we want to select the columns that we know and skip the ones that do not exist then we can use the subsetting.

For Example, if we have a data frame called df that contains twenty columns and we believe that x, y, z exists in df but z is not there in reality. Now the selection of columns x, y, z that will skip z can be done by using the below command −

df[,names(df) %in% c("x","y","z")]

Example 1

Consider the below data frame −

x1<-rnorm(20)
x3<-rnorm(20)
x4<-rnorm(20)
df1<-data.frame(x1,x3,x4)
df1

The following dataframe is created

             x1          x3          x4
1    0.39242697  1.94369518 -0.36692667
2   -2.87236253  0.63008900 -1.06281211
3   -0.65349377 -0.88442286 -0.01778122
4   -1.17954360 -1.12290165  1.22420677
5    0.12765932 -2.47906508 -0.36339964
6    1.00167594  0.98720588  0.26306844
7   -0.45533660 -0.61367430  0.59131906
8    0.10805656  0.70099416 -1.25835396
9    0.41539962 -0.34988934 -1.16621416
10   1.69208586 -0.08883033  0.25785287
11   0.14335867 -1.67958251 -0.45326409
12  -0.69518421 -1.50169655 -0.32216638
13   0.29088005 -1.30874972 -0.28515476
14  -0.01994773  0.19276681 -0.36537207
15  -0.61455895 -0.59203646  0.09349088
16   0.34339425  0.86884825  1.04326014
17   1.71791754  0.88276790  0.66905104
18   2.06755011 -0.64288995 -0.09404691
19  -1.54713973  0.73062146 -2.27962611
20   1.33430182 -1.03840560  0.94347980

To select the columns x1, x2, x3, and x4 from df1 on the above created data frame, add the following code to the above snippet −

x1<-rnorm(20)
x3<-rnorm(20)
x4<-rnorm(20)
df1<-data.frame(x1,x3,x4)
df1[,names(df1) %in% c("x1","x2","x3","x4")]

Output

If you execute all the above given snippets as a single program, it generates the following Output −

            x1          x3          x4
1   0.39242697  1.94369518 -0.36692667
2  -2.87236253  0.63008900 -1.06281211
3  -0.65349377 -0.88442286 -0.01778122
4  -1.17954360 -1.12290165  1.22420677
5   0.12765932 -2.47906508 -0.36339964
6   1.00167594  0.98720588  0.26306844
7  -0.45533660 -0.61367430  0.59131906
8   0.10805656  0.70099416 -1.25835396
9   0.41539962 -0.34988934 -1.16621416
10  1.69208586 -0.08883033  0.25785287
11  0.14335867 -1.67958251 -0.45326409
12 -0.69518421 -1.50169655 -0.32216638
13  0.29088005 -1.30874972 -0.28515476
14 -0.01994773  0.19276681 -0.36537207
15 -0.61455895 -0.59203646  0.09349088
16  0.34339425  0.86884825  1.04326014
17  1.71791754  0.88276790  0.66905104
18  2.06755011 -0.64288995 -0.09404691
19 -1.54713973  0.73062146 -2.27962611
20  1.33430182 -1.03840560  0.94347980

Example 2

Consider the data frame given below −

y1<-rpois(20,5)
y2<-rpois(20,2)
y4<-rpois(20,5)
y6<-rpois(20,2)
df2<-data.frame(y1,y2,y4,y6)
df2

The following dataframe is created

  y1 y2 y4 y6
1  4  3  5  2
2  4  2  4  3
3  4  0  9  3
4  3  5  6  1
5  4  2  3  3
6  7  1  8  0
7  7  3  4  2
8  5  2  8  1
9  3  4  8  1
10 4  2 10  1
11 3  2  4  1
12 5  1  5  2
13 3  2  8  2
14 4  2  9  0
15 5  0  2  3
16 4  0  6  1
17 5  2  7  1
18 6  0  6  2
19 5  2  5  2
20 6  1  4  1

To select the columns y1, y2, y3, y4, y5, and y6 from df2 on the above created data frame, add the following code to the above snippet −

y1<-rpois(20,5)
y2<-rpois(20,2)
y4<-rpois(20,5)
y6<-rpois(20,2)
df2<-data.frame(y1,y2,y4,y6)
df2[,names(df2) %in% c("y1","y2","y3","y4","y5","y6")]

Output

If you execute all the above given snippets as a single program, it generates the following Output −

  y1 y2 y4 y6
1  4  3  5  2
2  4  2  4  3
3  4  0  9  3
4  3  5  6  1
5  4  2  3  3
6  7  1  8  0
7  7  3  4  2
8  5  2  8  1
9  3  4  8  1
10 4  2 10  1
11 3  2  4  1
12 5  1  5  2
13 3  2  8  2
14 4  2  9  0
15 5  0  2  3
16 4  0  6  1
17 5  2  7  1
18 6  0  6  2
19 5  2  5  2
20 6  1  4  1

Updated on: 28-Oct-2021

519 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements