How to split a long string into a vector of substrings of equal sizes in R?

R ProgrammingServer Side ProgrammingProgramming

If a vector is recorded as a single string by mistake or the file that contains the data did not separated the string in an appropriate way then we might need to split in the correct form so that we can proceed with the further analysis. This might happen when the levels of a factor variable that have equal name length are not separated. In this case, we can split the string into a vector that contain substring of equal sizes by using substring function.

Examples

Just look at these examples to understand how substring function can help us to split the string into a vector of substrings −

 Live Demo

Factor<-"aabbccddabacadbabcbdcacbcddadbdc"
substring(Factor,seq(1,nchar(Factor),2),seq(2,nchar(Factor), 2))

Output

[1] "aa" "bb" "cc" "dd" "ab" "ac" "ad" "ba" "bc" "bd" "ca" "cb" "cd" "da" "db"
[16] "dc"
x1<-"abcdefghijklmopqrstuvwxyz"
substring(x1,seq(1,nchar(x1),2),seq(2,nchar(x1), 2))
[1] "ab" "cd" "ef" "gh" "ij" "kl" "mo" "pq" "rs" "tu" "vw" "xy" ""
substring(x1,seq(1,nchar(x1),2),seq(3,nchar(x1), 2))
[1] "abc" "cde" "efg" "ghi" "ijk" "klm" "mop" "pqr" "rst" "tuv" "vwx" "xyz"
[13] ""
substring(x1,seq(1,nchar(x1),3),seq(3,nchar(x1), 3))
[1] "abc" "def" "ghi" "jkl" "mop" "qrs" "tuv" "wxy" ""
substring(x1,seq(1,nchar(x1),4),seq(3,nchar(x1), 4))
[1] "abc" "efg" "ijk" "mop" "rst" "vwx" ""
substring(x1,seq(1,nchar(x1),4),seq(4,nchar(x1), 4))
[1] "abcd" "efgh" "ijkl" "mopq" "rstu" "vwxy" ""
substring(x1,seq(1,nchar(x1),4),seq(5,nchar(x1), 4))
[1] "abcde" "efghi" "ijklm" "mopqr" "rstuv" "vwxyz" ""
substring(x1,seq(1,nchar(x1),5),seq(5,nchar(x1), 5))
[1] "abcde" "fghij" "klmop" "qrstu" "vwxyz"
substring(x1,seq(1,nchar(x1),10),seq(5,nchar(x1), 10))
[1] "abcde" "klmop" "vwxyz"
substring(x1,seq(1,nchar(x1),10),seq(10,nchar(x1), 10))
[1] "abcdefghij" "klmopqrstu" ""
substring(x1,seq(1,nchar(x1),10),seq(2,nchar(x1), 10))
[1] "ab" "kl" "vw"
substring(x1,seq(1,nchar(x1),10),seq(3,nchar(x1), 10))
[1] "abc" "klm" "vwx"
substring(x1,seq(1,nchar(x1),10),seq(5,nchar(x1), 10))
[1] "abcde" "klmop" "vwxyz"
substring(x1,seq(1,nchar(x1),2),seq(2,nchar(x1)+2-1, 2))
[1] "ab" "cd" "ef" "gh" "ij" "kl" "mo" "pq" "rs" "tu" "vw" "xy" "z"
substring(x1,seq(1,nchar(x1),4),seq(4,nchar(x1)+4-1, 4))
[1] "abcd" "efgh" "ijkl" "mopq" "rstu" "vwxy" "z"
substring(x1,seq(1,nchar(x1),3),seq(4,nchar(x1)+4-1, 3))
[1] "abcd" "defg" "ghij" "jklm" "mopq" "qrst" "tuvw" "wxyz" "z"
substring(x1,seq(1,nchar(x1),5),seq(4,nchar(x1)+4-1, 5))
[1] "abcd" "fghi" "klmo" "qrst" "vwxy"
substring(x1,seq(1,nchar(x1),2),seq(4,nchar(x1)+4-1, 2))
[1] "abcd" "cdef" "efgh" "ghij" "ijkl" "klmo" "mopq" "pqrs" "rstu" "tuvw"
[11] "vwxy" "xyz" "z"
substring(x1,seq(1,nchar(x1),2),seq(5,nchar(x1)+5-1, 2))
[1] "abcde" "cdefg" "efghi" "ghijk" "ijklm" "klmop" "mopqr" "pqrst" "rstuv"
[10] "tuvwx" "vwxyz" "xyz" "z"
raja
Published on 21-Aug-2020 06:29:01
Advertisements