How to split a long string into a vector of substrings of equal sizes in R?


If a vector is recorded as a single string by mistake or the file that contains the data did not separated the string in an appropriate way then we might need to split in the correct form so that we can proceed with the further analysis. This might happen when the levels of a factor variable that have equal name length are not separated. In this case, we can split the string into a vector that contain substring of equal sizes by using substring function.

Examples

Just look at these examples to understand how substring function can help us to split the string into a vector of substrings −

 Live Demo

Factor<-"aabbccddabacadbabcbdcacbcddadbdc"
substring(Factor,seq(1,nchar(Factor),2),seq(2,nchar(Factor), 2))

Output

[1] "aa" "bb" "cc" "dd" "ab" "ac" "ad" "ba" "bc" "bd" "ca" "cb" "cd" "da" "db"
[16] "dc"
x1<-"abcdefghijklmopqrstuvwxyz"
substring(x1,seq(1,nchar(x1),2),seq(2,nchar(x1), 2))
[1] "ab" "cd" "ef" "gh" "ij" "kl" "mo" "pq" "rs" "tu" "vw" "xy" ""
substring(x1,seq(1,nchar(x1),2),seq(3,nchar(x1), 2))
[1] "abc" "cde" "efg" "ghi" "ijk" "klm" "mop" "pqr" "rst" "tuv" "vwx" "xyz"
[13] ""
substring(x1,seq(1,nchar(x1),3),seq(3,nchar(x1), 3))
[1] "abc" "def" "ghi" "jkl" "mop" "qrs" "tuv" "wxy" ""
substring(x1,seq(1,nchar(x1),4),seq(3,nchar(x1), 4))
[1] "abc" "efg" "ijk" "mop" "rst" "vwx" ""
substring(x1,seq(1,nchar(x1),4),seq(4,nchar(x1), 4))
[1] "abcd" "efgh" "ijkl" "mopq" "rstu" "vwxy" ""
substring(x1,seq(1,nchar(x1),4),seq(5,nchar(x1), 4))
[1] "abcde" "efghi" "ijklm" "mopqr" "rstuv" "vwxyz" ""
substring(x1,seq(1,nchar(x1),5),seq(5,nchar(x1), 5))
[1] "abcde" "fghij" "klmop" "qrstu" "vwxyz"
substring(x1,seq(1,nchar(x1),10),seq(5,nchar(x1), 10))
[1] "abcde" "klmop" "vwxyz"
substring(x1,seq(1,nchar(x1),10),seq(10,nchar(x1), 10))
[1] "abcdefghij" "klmopqrstu" ""
substring(x1,seq(1,nchar(x1),10),seq(2,nchar(x1), 10))
[1] "ab" "kl" "vw"
substring(x1,seq(1,nchar(x1),10),seq(3,nchar(x1), 10))
[1] "abc" "klm" "vwx"
substring(x1,seq(1,nchar(x1),10),seq(5,nchar(x1), 10))
[1] "abcde" "klmop" "vwxyz"
substring(x1,seq(1,nchar(x1),2),seq(2,nchar(x1)+2-1, 2))
[1] "ab" "cd" "ef" "gh" "ij" "kl" "mo" "pq" "rs" "tu" "vw" "xy" "z"
substring(x1,seq(1,nchar(x1),4),seq(4,nchar(x1)+4-1, 4))
[1] "abcd" "efgh" "ijkl" "mopq" "rstu" "vwxy" "z"
substring(x1,seq(1,nchar(x1),3),seq(4,nchar(x1)+4-1, 3))
[1] "abcd" "defg" "ghij" "jklm" "mopq" "qrst" "tuvw" "wxyz" "z"
substring(x1,seq(1,nchar(x1),5),seq(4,nchar(x1)+4-1, 5))
[1] "abcd" "fghi" "klmo" "qrst" "vwxy"
substring(x1,seq(1,nchar(x1),2),seq(4,nchar(x1)+4-1, 2))
[1] "abcd" "cdef" "efgh" "ghij" "ijkl" "klmo" "mopq" "pqrs" "rstu" "tuvw"
[11] "vwxy" "xyz" "z"
substring(x1,seq(1,nchar(x1),2),seq(5,nchar(x1)+5-1, 2))
[1] "abcde" "cdefg" "efghi" "ghijk" "ijklm" "klmop" "mopqr" "pqrst" "rstuv"
[10] "tuvwx" "vwxyz" "xyz" "z"

Updated on: 21-Aug-2020

278 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements