How to extract strings based on first character from a vector of strings in R?


Sometimes a vector strings have patterns and sometimes we need to make patterns from a vector of strings based on the characters. For example, we might want to extract the states name of United States of America from a vector that contains all the names. This can be done by using grepl function.

Example

Consider the below vector containing states name in USA −

> US_states<-c("Alabama", "Alaska", "American Samoa", "Arizona", "Arkansas",
"California", "Colorado", "Connecticut", "Delaware", "District of Columbia", "Florida",
"Georgia", "Guam", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas",
"Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan",
"Minnesota", "Minor Outlying Islands", "Mississippi", "Missouri", "Montana",
"Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York",
"North Carolina", "North Dakota", "Northern Mariana Islands", "Ohio", "Oklahoma",
"Oregon", "Pennsylvania", "Puerto Rico", "Rhode Island", "South Carolina", "South
Dakota", "Tennessee", "Texas", "U.S. Virgin Islands", "Utah", "Vermont", "Virginia",
"Washington", "West Virginia", "Wisconsin", "Wyoming")
> US_states[grepl("^A",US_states)]
[1] "Alabama" "Alaska" "American Samoa" "Arizona"
[5] "Arkansas"
> US_states[grepl("^B",US_states)]
character(0)
> US_states[grepl("^C",US_states)]
[1] "California" "Colorado" "Connecticut"
> US_states[grepl("^D",US_states)]
[1] "Delaware" "District of Columbia"
> US_states[grepl("^E",US_states)]
character(0)
> US_states[grepl("^F",US_states)]
[1] "Florida"
> US_states[grepl("^G",US_states)]
[1] "Georgia" "Guam"
> US_states[grepl("^H",US_states)]
[1] "Hawaii"
> US_states[grepl("^I",US_states)]
[1] "Idaho" "Illinois" "Indiana" "Iowa"
> US_states[grepl("^J",US_states)]
character(0)
> US_states[grepl("^K",US_states)]
[1] "Kansas" "Kentucky"
> US_states[grepl("^L",US_states)]
[1] "Louisiana"
> US_states[grepl("^M",US_states)]
[1] "Maine" "Maryland" "Massachusetts"
[4] "Michigan" "Minnesota" "Minor Outlying Islands"
[7] "Mississippi" "Missouri" "Montana"
> US_states[grepl("^N",US_states)]
[1] "Nebraska" "Nevada"
[3] "New Hampshire" "New Jersey"
[5] "New Mexico" "New York"
[7] "North Carolina" "North Dakota"
[9] "Northern Mariana Islands"
> US_states[grepl("^O",US_states)]
[1] "Ohio" "Oklahoma" "Oregon"
> US_states[grepl("^P",US_states)]
[1] "Pennsylvania" "Puerto Rico"
> US_states[grepl("^Q",US_states)]
character(0)
> US_states[grepl("^R",US_states)]
[1] "Rhode Island"
> US_states[grepl("^S",US_states)]
[1] "South Carolina" "South Dakota"
> US_states[grepl("^T",US_states)]
[1] "Tennessee" "Texas"
> US_states[grepl("^U",US_states)]
[1] "U.S. Virgin Islands" "Utah"
> US_states[grepl("^V",US_states)]
[1] "Vermont" "Virginia"
> US_states[grepl("^W",US_states)]
[1] "Washington" "West Virginia" "Wisconsin" "Wyoming"
> US_states[grepl("^X",US_states)]
character(0)
> US_states[grepl("^Y",US_states)]
character(0)
> US_states[grepl("^Z",US_states)]
character(0)

Updated on: 04-Sep-2020

228 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements