- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Extract string vector elements up to a fixed number of characters in R.
To extract string vector elements up to a fixed number of characters in R, we can use substring function of base R.
For Example, if we have a vector of strings say X that contains 100 string values and we want to find the first five character of each value then we can use the command as given below −
substring(X,1,5)
Example 1
Following snippet creates a sample data frame −
x1<-c("Alabama", "Alaska", "American Samoa", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "District of Columbia", "Florida", "Georgia", "Guam", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Minor Outlying Islands", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Northern Mariana Islands", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Puerto Rico", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "U.S. Virgin Islands", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming") x1
The following dataframe is created
[1] "Alabama" "Alaska" [3] "American Samoa" "Arizona" [5] "Arkansas" "California" [7] "Colorado" "Connecticut" [9] "Delaware" "District of Columbia" [11] "Florida" "Georgia" [13] "Guam" "Hawaii" [15] "Idaho" "Illinois" [17] "Indiana" "Iowa" [19] "Kansas" "Kentucky" [21] "Louisiana" "Maine" [23] "Maryland" "Massachusetts" [25] "Michigan" "Minnesota" [27] "Minor Outlying Islands" "Mississippi" [29] "Missouri" "Montana" [31] "Nebraska" "Nevada" [33] "New Hampshire" "New Jersey" [35] "New Mexico" "New York" [37] "North Carolina" "North Dakota" [39] "Northern Mariana Islands" "Ohio" [41] "Oklahoma" "Oregon" [43] "Pennsylvania" "Puerto Rico" [45] "Rhode Island" "South Carolina" [47] "South Dakota" "Tennessee" [49] "Texas" "U.S. Virgin Islands" [51] "Utah" "Vermont" [53] "Virginia" "Washington" [55] "West Virginia" "Wisconsin" [57] "Wyoming"
To find first two characters of each value in x1 on the above created data frame, add the following code to the above snippet −
x1<-c("Alabama", "Alaska", "American Samoa", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "District of Columbia", "Florida", "Georgia", "Guam", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Minor Outlying Islands", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Northern Mariana Islands", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Puerto Rico", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "U.S. Virgin Islands", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming") substring(x1,1,2)
Output
If you execute all the above given snippets as a single program, it generates the following Output −
[1] "Al" "Al" "Am" "Ar" "Ar" "Ca" "Co" "Co" "De" "Di" "Fl" "Ge" "Gu" "Ha" "Id" [16] "Il" "In" "Io" "Ka" "Ke" "Lo" "Ma" "Ma" "Ma" "Mi" "Mi" "Mi" "Mi" "Mi" "Mo" [31] "Ne" "Ne" "Ne" "Ne" "Ne" "Ne" "No" "No" "No" "Oh" "Ok" "Or" "Pe" "Pu" "Rh" [46] "So" "So" "Te" "Te" "U." "Ut" "Ve" "Vi" "Wa" "We" "Wi" "Wy"
Example 2
Following snippet creates a sample data frame −
x2<-c("Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus", "Czechia", "Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary", "Ireland", "Italy", "Latvia", "Lithuania", "Luxembourg", "Malta", "Netherlands", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia", "Spain", "Sweden") x2
The following dataframe is created
[1] "Austria" "Belgium" "Bulgaria" "Croatia" "Cyprus" [6] "Czechia" "Denmark" "Estonia" "Finland" "France" [11] "Germany" "Greece" "Hungary" "Ireland" "Italy" [16] "Latvia" "Lithuania" "Luxembourg" "Malta" "Netherlands" [21] "Poland" "Portugal" "Romania" "Slovakia" "Slovenia" [26] "Spain" "Sweden"
To find first two characters of each value in x2 on the above created data frame, add the following code to the above snippet −
x2<-c("Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus", "Czechia", "Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary", "Ireland", "Italy", "Latvia", "Lithuania", "Luxembourg", "Malta", "Netherlands", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia", "Spain", "Sweden") substring(x2,1,2)
Output
If you execute all the above given snippets as a single program, it generates the following Output −
[1] "Au" "Be" "Bu" "Cr" "Cy" "Cz" "De" "Es" "Fi" "Fr" "Ge" "Gr" "Hu" "Ir" "It" [16] "La" "Li" "Lu" "Ma" "Ne" "Po" "Po" "Ro" "Sl" "Sl" "Sp" "Sw"
Example 3
Following snippet creates a sample data frame −
x3<-c("Cuba", "Cyprus", "Czech Republic", "Djibouti", "Dominica", "Dominican Republic", "East Timor", "Ecuador", "Egypt", "El Salvador", "Equatorial Guinea", "Eritrea", "Estonia", "Ethiopia", "Fiji", "Finland", "France", "Metropolitan", "French Guiana", "Gambia", "Georgia", "Germany", "Ghana", "Greenland", "Grenada", "Guatemala", "Honduras", "Hong Kong", "Hungary", "Iceland", "India", "Indonesia", "Iran", "Iraq", "Ireland", "Israel", "Italy", "Jamaica", "Japan", "Jordan", "Kazakhstan", "Kenya", "Mozambique", "Namibia", "Nepal", "Netherlands", "Nigeria", "Norway", "Oman", "Paraguay", "Peru", "Philippines") x3
The following dataframe is created
[1] "Cuba" "Cyprus" "Czech Republic" [4] "Djibouti" "Dominica" "Dominican Republic" [7] "East Timor" "Ecuador" "Egypt" [10] "El Salvador" "Equatorial Guinea" "Eritrea" [13] "Estonia" "Ethiopia" "Fiji" [16] "Finland" "France" "Metropolitan" [19] "French Guiana" "Gambia" "Georgia" [22] "Germany" "Ghana" "Greenland" [25] "Grenada" "Guatemala" "Honduras" [28] "Hong Kong" "Hungary" "Iceland" [31] "India" "Indonesia" "Iran" [34] "Iraq" "Ireland" "Israel" [37] "Italy" "Jamaica" "Japan" [40] "Jordan" "Kazakhstan" "Kenya" [43] "Mozambique" "Namibia" "Nepal" [46] "Netherlands" "Nigeria" "Norway" [49] "Oman" "Paraguay" "Peru" [52] "Philippines"
To find first two characters of each value in x3 on the above created data frame, add the following code to the above snippet −
x3<-c("Cuba", "Cyprus", "Czech Republic", "Djibouti", "Dominica", "Dominican Republic", "East Timor", "Ecuador", "Egypt", "El Salvador", "Equatorial Guinea", "Eritrea", "Estonia", "Ethiopia", "Fiji", "Finland", "France", "Metropolitan", "French Guiana", "Gambia", "Georgia", "Germany", "Ghana", "Greenland", "Grenada", "Guatemala", "Honduras", "Hong Kong", "Hungary", "Iceland", "India", "Indonesia", "Iran", "Iraq", "Ireland", "Israel", "Italy", "Jamaica", "Japan", "Jordan", "Kazakhstan", "Kenya", "Mozambique", "Namibia", "Nepal", "Netherlands", "Nigeria", "Norway", "Oman", "Paraguay", "Peru", "Philippines") substring(x3,1,2)
Output
If you execute all the above given snippets as a single program, it generates the following Output −
[1] "Cu" "Cy" "Cz" "Dj" "Do" "Do" "Ea" "Ec" "Eg" "El" "Eq" "Er" "Es" "Et" "Fi" [16] "Fi" "Fr" "Me" "Fr" "Ga" "Ge" "Ge" "Gh" "Gr" "Gr" "Gu" "Ho" "Ho" "Hu" "Ic" [31] "In" "In" "Ir" "Ir" "Ir" "Is" "It" "Ja" "Ja" "Jo" "Ka" "Ke" "Mo" "Na" "Ne" [46] "Ne" "Ni" "No" "Om" "Pa" "Pe" "Ph"
- Related Articles
- How to find the unique combinations of a string vector elements with a fixed size in R?
- How to extract characters from a string in R?
- How to extract words from a string vector in R?
- How to extract string before slash from a vector in R?
- How to extract first two characters from a string in R?
- How to find the number of occurrences of unique and repeated characters in a string vector in R?
- How to extract the split string elements in R?
- How to extract initial, last, or middle characters from a string in R?
- How to find the intersection of elements in a string vector in R?
- How to extract all string values from a vector in R with maximum lengths?
- How to extract the names of vector values from a named vector in R?
- How to sum up elements of a C++ vector?
- How to extract a data.table row as a vector in R?
- How to extract number from string in R data frame?
- How to split a vector by equal and different number of elements in R?
