Extract columns with a string in column name of an R data frame.


To extract columns with a particular string in column name of an R data frame, we can use grepl function for column names and then subset the data frame with single square brackets.

For Example, if we have a data frame called df and we want to extract columns that has X in their names then we can use the command mentioned below −

df[grepl("X",colnames(df))]

Example 1

Following snippet creates a sample data frame −

Students_Score<-sample(1:50,20)
Teachers_Rank<-sample(1:5,20,replace=TRUE)
Teachers_Score<-sample(1:50,20)
df1<-data.frame(Students_Score,Teachers_Rank,Teachers_Score)
df1

The following dataframe is created

   Students_Score Teachers_Rank Teachers_Score
1              37             3             42
2              50             4             15
3               8             5             21
4              29             3             35
5              10             5              3
6               2             2             41
7              12             4             29
8               1             4             44
9              41             2             10
10             39             3             39
11             27             3             43
12             18             1             48
13             44             5             12
14             21             4             16
15             16             3             20
16             45             5             50
17             17             1             31
18             49             1             30
19             47             5             17
20             32             5              8

To extract columns of df1 that contains Score in column name on the above created data frame, add the following code to the above snippet −

Students_Score<-sample(1:50,20)
Teachers_Rank<-sample(1:5,20,replace=TRUE)
Teachers_Score<-sample(1:50,20)
df1<-data.frame(Students_Score,Teachers_Rank,Teachers_Score)
df1[grepl("Score",colnames(df1))]

Output

If you execute all the above given snippets as a single program, it generates the following Output −

 Students_Score Teachers_Score
1            37             42
2            50             15
3             8             21
4            29             35
5            10              3
6             2             41
7            12             29
8             1             44
9            41             10
10           39             39
11           27             43
12           18             48
13           44             12
14           21             16
15           16             20
16           45             50
17           17             31
18           49             30
19           47             17
20           32              8

Example 2

Following snippet creates a sample data frame −

Hot_Temp<-sample(33:50,20,replace=TRUE)
Cold_Temp<-sample(1:10,20,replace=TRUE)
Group<-sample(c("First","Second","Third"),20,replace=TRUE)
df2<-data.frame(Hot_Temp,Cold_Temp,Group)
df2

The following dataframe is created

  Hot_Temp Cold_Temp  Group
1       47         4  Third
2       33         5  First
3       36         2  Second
4       35         8  Second
5       33         8  First
6       44         1  Third
7       33         8  Third
8       46         3  First
9       36         3  Third
10      44         6  First
11      43        10  Third
12      35         9  First
13      36         4  Third
14      44         5  Second
15      48         5  Second
16      37         6  Second
17      35         5  Second
18      42         4  First
19      40         4  Second
20      42         4  Third

To extract columns of df2 that contains Temp in column name on the above created data frame, add the following code to the above snippet −

Hot_Temp<-sample(33:50,20,replace=TRUE)
Cold_Temp<-sample(1:10,20,replace=TRUE)
Group<-sample(c("First","Second","Third"),20,replace=TRUE)
df2<-data.frame(Hot_Temp,Cold_Temp,Group)
df2[grepl("Temp",colnames(df2))]

Output

If you execute all the above given snippets as a single program, it generates the following Output −

 Hot_Temp Cold_Temp
1     47         4
2     33         5
3     36         2
4     35         8
5     33         8
6     44         1
7     33         8
8     46         3
9     36         3
10    44         6
11    43        10
12    35         9
13    36         4
14    44         5
15    48         5
16    37         6
17    35         5
18    42         4
19    40         4
20    42         4

Updated on: 01-Nov-2021

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements