How to separate strings in R that are joined with special characters?


When we deal with text data it is difficult to make it clean and one of the most of basic problem with this type of data is that the values are separated with some unique characters such as special characters. For this purpose, we can use strsplit function that makes it easy to do the separation among text values. Check out the examples below to understand how it can be done.

Example

 Live Demo

x1<-"A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z"
x1

Output

[1] "A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z"

Example

strsplit(x1,"[-]")

Output

[[1]] [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"

Example

 Live Demo

x2<-"AK:AL:AR:AS:AZ:CA:CO:CT:DC:DE:FL:GA:GU:HI:IA:ID:IL:IN:KS:KY:LA:MA:MD:ME:MI:MN:MO:MP:MS:MT:NC:ND:NE:NH:NJ:NM:NV:NY:OH:OK:OR:PA:PR:RI:SC:SD:TN:TX:UM:UT:VA:VI:VT:WA:WI:WV:WY"
x2

Output

[1] "AK:AL:AR:AS:AZ:CA:CO:CT:DC:DE:FL:GA:GU:HI:IA:ID:IL:IN:KS:KY:LA:MA:M
D:ME:MI:MN:MO:MP:MS:MT:NC:ND:NE:NH:NJ:NM:NV:NY:OH:OK:OR:PA:PR:RI:SC:SD:TN:TX:UM:UT:VA:VI:VT:WA:WI:WV:WY"

Example

strsplit(x2,"[:]")

Output

[[1]] [1] "AK" "AL" "AR" "AS" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "GU" "HI" "IA"
[16] "ID" "IL" "IN" "KS" "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MP" "MS" "MT" 
[31] "NC" "ND" "NE" "NH" "NJ" "NM" "NV" "NY" "OH" "OK" "OR" "PA" "PR" "RI" "SC"
[46] "SD" "TN" "TX" "UM" "UT" "VA" "VI" "VT" "WA" "WI" "WV" "WY"

Example

 Live Demo

x3<-"AK/AL/AR/AS/AZ/CA/CO/CT/DC/DE/FL/GA/GU/HI/IA/ID/IL/IN/KS/KY/LA/MA/MD/ME/MI/MN/MO/MP/MS/MT/NC/ND/NE/NH/NJ/NM/NV/NY/OH/OK/OR/PA/PR/RI/SC/SD/TN/TX/UM/UT/VA/VI/VT/WA/WI/WV/WY"
x3

Output

[1] "AK/AL/AR/AS/AZ/CA/CO/CT/DC/DE/FL/GA/GU/HI/IA/ID/IL/IN/KS/KY/LA/MA/MD/ME/MI/MN/MO/MP/MS/MT/NC/ND/NE/NH/NJ/NM/NV/NY/OH/OK/OR/PA/PR/RI/SC/SD/TN/TX/UM/UT/VA/VI/VT/WA/WI/WV/WY"

Example

strsplit(x3,"[/]")

Output

[[1]] [1] "AK" "AL" "AR" "AS" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "GU" "HI" "IA"
[16] "ID" "IL" "IN" "KS" "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MP" "MS" "MT"
[31] "NC" "ND" "NE" "NH" "NJ" "NM" "NV" "NY" "OH" "OK" "OR" "PA" "PR" "RI" "SC"
[46] "SD" "TN" "TX" "UM" "UT" "VA" "VI" "VT" "WA" "WI" "WV" "WY"

Example

 Live Demo

x4<-"AK~AL~AR~AS~AZ~CA~CO~CT~DC~DE~FL~GA~GU~HI~IA~ID~IL~IN~KS~KY~LA~MA~MD~ME~MI~MN~MO~MP~MS~MT~NC~ND~NE~NH~NJ~NM~NV~NY~OH~OK~OR~PA~PR~RI~SC~SD~TN~TX~UM~UT~VA~VI~VT~WA~WI~WV~WY"
x4

Output

[1] "AK~AL~AR~AS~AZ~CA~CO~CT~DC~DE~FL~GA~GU~HI~IA~ID~IL~IN~KS~KY~LA~MA~MD~ME~MI~MN~MO~MP~MS~MT~NC~ND~NE~NH~NJ~NM~NV~NY~OH~OK~OR~PA~PR~RI~SC~SD~TN~TX~UM~UT~VA~VI~VT~WA~WI~WV~WY"

Example

strsplit(x4,"[~]")

Output

[[1]] [1] "AK" "AL" "AR" "AS" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "GU" "HI" "IA"
[16] "ID" "IL" "IN" "KS" "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MP" "MS" "MT"
[31] "NC" "ND" "NE" "NH" "NJ" "NM" "NV" "NY" "OH" "OK" "OR" "PA" "PR" "RI" "SC"
[46] "SD" "TN" "TX" "UM" "UT" "VA" "VI" "VT" "WA" "WI" "WV" "WY"

Example

 Live Demo

x5<-"AK*AL*AR*AS*AZ*CA*CO*CT*DC*DE*FL*GA*GU*HI*IA*ID*IL*IN*KS*KY*LA*MA*MD*ME*MI*MN*MO*MP*MS*MT*NC*ND*NE*NH*NJ*NM*NV*NY*OH*OK*OR*PA*PR*RI*SC*SD*TN*TX*UM*UT*VA*VI*VT*WA*WI*WV*WY"
x5

Output

[1] "AK*AL*AR*AS*AZ*CA*CO*CT*DC*DE*FL*GA*GU*HI*IA*ID*IL*IN*KS*KY*LA*MA*MD*ME*MI*MN*MO*MP*MS*MT*NC*ND*NE*NH*NJ*NM*NV*NY*OH*OK*OR*PA*PR*RI*SC*SD*TN*TX*UM*UT*VA*VI*VT*WA*WI*WV*WY"

Example

strsplit(x5,"[*]")

Output

[[1]] [1] "AK" "AL" "AR" "AS" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "GU" "HI" "IA"
[16] "ID" "IL" "IN" "KS" "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MP" "MS" "MT"
[31] "NC" "ND" "NE" "NH" "NJ" "NM" "NV" "NY" "OH" "OK" "OR" "PA" "PR" "RI" "SC"
[46] "SD" "TN" "TX" "UM" "UT" "VA" "VI" "VT" "WA" "WI" "WV" "WY"

Example

 Live Demo

x6<-c("AK*AL*AR*AS*AZ*CA","CO*CT*DC*DE*FL*GA","GU*HI*IA*ID*IL*IN*KS","KY*LA*MA*MD*ME*MI","MN*MO*MP*MS*MT*NC","ND*NE*NH*NJ*NM*NV","NY*OH*OK*OR*PA*PR","RI*SC*SD*TN*TX*UM","UT*VA*VI*VT","WA*WI*WV*WY")
x6

Output

[1] "AK*AL*AR*AS*AZ*CA" "CO*CT*DC*DE*FL*GA" "GU*HI*IA*ID*IL*IN*KS"
[4] "KY*LA*MA*MD*ME*MI" "MN*MO*MP*MS*MT*NC" "ND*NE*NH*NJ*NM*NV"
[7] "NY*OH*OK*OR*PA*PR" "RI*SC*SD*TN*TX*UM" "UT*VA*VI*VT"
[10] "WA*WI*WV*WY"

Example

strsplit(x6,"[*]")

Output

[[1]] [1] "AK" "AL" "AR" "AS" "AZ" "CA"
[[2]] [1] "CO" "CT" "DC" "DE" "FL" "GA"
[[3]] [1] "GU" "HI" "IA" "ID" "IL" "IN" "KS"
[[4]] [1] "KY" "LA" "MA" "MD" "ME" "MI"
[[5]] [1] "MN" "MO" "MP" "MS" "MT" "NC"
[[6]] [1] "ND" "NE" "NH" "NJ" "NM" "NV"
[[7]] [1] "NY" "OH" "OK" "OR" "PA" "PR"
[[8]] [1] "RI" "SC" "SD" "TN" "TX" "UM"
[[9]] [1] "UT" "VA" "VI" "VT"
[[10]] [1] "WA" "WI" "WV" "WY"

Updated on: 16-Oct-2020

95 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements