- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to extract values from an R data frame column that do not start and end with certain characters?
Sometimes we just want to extract the values of a data column based on initial and ending values of a column that has strings or sometimes the values of a column that has strings are recorded with some extra characters and we want to extract those values. For this purpose, we can use negation of grepl with single square brackets.
Example
Consider the below data frame −
> x2<-c("Alabama", "Alaska", "American Samoa", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "District of Columbia", "Florida", "Georgia", "Guam", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Minor Outlying Islands", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Northern Mariana Islands", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Puerto Rico", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "U.S. Virgin Islands", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming")
> df2<-data.frame(x2) > head(df2,20)
Output
x2 1 Alabama 2 Alaska 3 American Samoa 4 Arizona 5 Arkansas 6 California 7 Colorado 8 Connecticut 9 Delaware 10 District of Columbia 11 Florida 12 Georgia 13 Guam 14 Hawaii 15 Idaho 16 Illinois 17 Indiana 18 Iowa 19 Kansas 20 Kentucky
Finding states that neither start with A nor ends with a −
> df2[!grepl("^A|a$",df2$x2),]
Output
[1] Colorado Connecticut Delaware [4] Guam Hawaii Idaho [7] Illinois Kansas Kentucky [10] Maine Maryland Massachusetts [13] Michigan Minor Outlying Islands Mississippi [16] Missouri New Hampshire New Jersey [19] New Mexico New York Northern Mariana Islands [22] Ohio Oregon Puerto Rico [25] Rhode Island Tennessee Texas [28] U.S. Virgin Islands Utah Vermont [31] Washington Wisconsin Wyoming 57 Levels: Alabama Alaska American Samoa Arizona Arkansas ... Wyoming
Let’s have a look at another example −
> x1<- c("Indiaaa","Chinaaa","Russiaa","Canadaaa","Indonesiaaa","Croatiaaa","Mauritaniaaa"," Albaniaaa","Angolaaa","Armeniaaa","Malaysiaaa","Maltaaa","Boliviaaa","Burmaaa","Pa nama","Romaniaa","Saudi-Arabia","Serbiaaa","Syriaaa","Tongaaa","Koreaaa","Libya")
> y1<-sample(1:10,22,replace=TRUE) > df1<-data.frame(x1,y1) > df1
Output
x1 y1 1 Indiaaa 6 2 Chinaaa 1 3 Russiaa 9 4 Canadaaa 7 5 Indonesiaaa 7 6 Croatiaaa 3 7 Mauritaniaaa 6 8 Albaniaaa 2 9 Angolaaa 10 10 Armeniaaa 10 11 Malaysiaaa 7 12 Maltaaa 3 13 Boliviaaa 2 14 Burmaaa 10 15 Panama 1 16 Romaniaa 10 17 Saudi-Arabia 10 18 Serbiaaa 8 19 Syriaaa 10 20 Tongaaa 5 21 Koreaaa 7 22 Libya 8
> df1[!grepl("^A|aa$",df1$x1),]
Output
x1 y1 15 Panama 1 17 Saudi-Arabia 10 22 Libya 8
> df1[!grepl("^S|aa$",df1$x1),]
Output
x1 y1 15 Panama 1 22 Libya 8
> df1[!grepl("^B|aa$",df1$x1),]
Output
x1 y1 15 Panama 1 17 Saudi-Arabia 10 22 Libya 8
> df1[!grepl("^P|aa$",df1$x1),]
Output
x1 y1 17 Saudi-Arabia 10 22 Libya 8
> df1[!grepl("^L|aa$",df1$x1),]
Output
x1 y1 15 Panama 1 17 Saudi-Arabia 10
- Related Articles
- How to extract column names that do not have even one missing value in an R data frame?
- How to find rows in an R data frame that do not have missing values?
- How to extract columns based on particular column values of an R data frame that match\na pattern?
- How to extract the factor levels from factor column in an R data frame?
- How to subset non-duplicate values from an R data frame column?
- How to extract a single column of an R data frame as a data frame?
- Extract a particular level from factor column in an R data frame.
- How to fill NA values with previous values in an R data frame column?
- How to extract a particular value based on index from an R data frame column?
- How to extract the first digit from a character column in an R data frame?
- How to subtract column values from column means in R data frame?
- How to remove rows that contains NA values in certain columns of an R data frame?
- How to replace missing values with median in an R data frame column?
- How to create a row at the end an R data frame with column totals?
- How to find the number of values in a column of an R data frame that are not zero?

Advertisements