How to extract values from an R data frame column that do not start and end with certain characters?


Sometimes we just want to extract the values of a data column based on initial and ending values of a column that has strings or sometimes the values of a column that has strings are recorded with some extra characters and we want to extract those values. For this purpose, we can use negation of grepl with single square brackets.

Example

Consider the below data frame −

> x2<-c("Alabama", "Alaska", "American Samoa", "Arizona", "Arkansas", "California",
"Colorado", "Connecticut", "Delaware", "District of Columbia", "Florida", "Georgia",
"Guam", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky",
"Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Minor
Outlying Islands", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New
Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North
Dakota", "Northern Mariana Islands", "Ohio", "Oklahoma", "Oregon", "Pennsylvania",
"Puerto Rico", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas",
"U.S. Virgin Islands", "Utah", "Vermont", "Virginia", "Washington", "West Virginia",
"Wisconsin", "Wyoming")
> df2<-data.frame(x2)
> head(df2,20)

Output

x2
1 Alabama
2 Alaska
3 American Samoa
4 Arizona
5 Arkansas
6 California
7 Colorado
8 Connecticut
9 Delaware
10 District of Columbia
11 Florida
12 Georgia
13 Guam
14 Hawaii
15 Idaho
16 Illinois
17 Indiana
18 Iowa
19 Kansas
20 Kentucky

Finding states that neither start with A nor ends with a −

> df2[!grepl("^A|a$",df2$x2),]

Output

[1] Colorado Connecticut Delaware
[4] Guam Hawaii Idaho
[7] Illinois Kansas Kentucky
[10] Maine Maryland Massachusetts
[13] Michigan Minor Outlying Islands Mississippi
[16] Missouri New Hampshire New Jersey
[19] New Mexico New York Northern Mariana Islands
[22] Ohio Oregon Puerto Rico
[25] Rhode Island Tennessee Texas
[28] U.S. Virgin Islands Utah Vermont
[31] Washington Wisconsin Wyoming
57 Levels: Alabama Alaska American Samoa Arizona Arkansas ... Wyoming

Let’s have a look at another example −

> x1<-
c("Indiaaa","Chinaaa","Russiaa","Canadaaa","Indonesiaaa","Croatiaaa","Mauritaniaaa","
Albaniaaa","Angolaaa","Armeniaaa","Malaysiaaa","Maltaaa","Boliviaaa","Burmaaa","Pa
nama","Romaniaa","Saudi-Arabia","Serbiaaa","Syriaaa","Tongaaa","Koreaaa","Libya")
> y1<-sample(1:10,22,replace=TRUE)
> df1<-data.frame(x1,y1)
> df1

Output

x1 y1
1 Indiaaa 6
2 Chinaaa 1
3 Russiaa 9
4 Canadaaa 7
5 Indonesiaaa 7
6 Croatiaaa 3
7 Mauritaniaaa 6
8 Albaniaaa 2
9 Angolaaa 10
10 Armeniaaa 10
11 Malaysiaaa 7
12 Maltaaa 3
13 Boliviaaa 2
14 Burmaaa 10
15 Panama 1
16 Romaniaa 10
17 Saudi-Arabia 10
18 Serbiaaa 8
19 Syriaaa 10
20 Tongaaa 5
21 Koreaaa 7
22 Libya 8
> df1[!grepl("^A|aa$",df1$x1),]

Output

x1 y1
15 Panama 1
17 Saudi-Arabia 10
22 Libya 8
> df1[!grepl("^S|aa$",df1$x1),]

Output

x1 y1
15 Panama 1
22 Libya 8
> df1[!grepl("^B|aa$",df1$x1),]

Output

x1 y1
15 Panama 1
17 Saudi-Arabia 10
22 Libya 8
> df1[!grepl("^P|aa$",df1$x1),]

Output

x1 y1
17 Saudi-Arabia 10
22 Libya 8
> df1[!grepl("^L|aa$",df1$x1),]

Output

x1 y1
15 Panama 1
17 Saudi-Arabia 10

Updated on: 07-Sep-2020

56 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements