How to separate string and a numeric value in R?

R ProgrammingServer Side ProgrammingProgramming

To separate string and a numeric value, we can use strplit function and split the values by passing all type of characters and all the numeric values. For example, if we have a data frame called df that contains a character column Var having concatenated string and numerical values then we can split them using the below command −

strsplit(df$Var,split="(?<=[a-zA-Z])\s*(?=[0-9])",perl=TRUE)

Example

Consider the below data frame −

 Live Demo

x<-sample(c("india 123","china 232","sudan143","russia326 2"),20,replace=TRUE)
df1<-data.frame(x)
df1

Output

    x
1  sudan143
2  china 232
3  russia326 2
4  sudan143
5  sudan143
6  china 232
7  china 232
8  india 123
9  sudan143
10 china 232
11 india 123
12 russia326 2
13 sudan143
14 russia326 2
15 china 232
16 india 123
17 sudan143
18 india 123
19 china 232
20 china 232

Splitting string and numerical values in column x of df1 −

Example

strsplit(df1$x,split="(?<=[a-zA-Z])\s*(?=[0-9])",perl=TRUE)

Output

[[1]]
[1] "sudan" "143"
[[2]]
[1] "china" "232"
[[3]]
[1] "russia" "326 2"
[[4]]
[1] "sudan" "143"
[[5]]
[1] "sudan" "143"
[[6]]
[1] "china" "232"
[[7]]
[1] "china" "232"
[[8]]
[1] "india" "123"
[[9]]
[1] "sudan" "143"
[[10]]
[1] "china" "232"
[[11]]
[1] "india" "123"
[[12]]
[1] "russia" "326 2"
[[13]]
[1] "sudan" "143"
[[14]]
[1] "russia" "326 2"
[[15]]
[1] "china" "232"
[[16]]
[1] "india" "123"
[[17]]
[1] "sudan" "143"
[[18]]
[1] "india" "123"
[[19]]
[1] "china" "232"
[[20]]
[1] "china" "232"

Example

 Live Demo

y<-sample(c("orange 12","banana247","guava 235","kiwi 138"),20,replace=TRUE)
df2<-data.frame(y)
df2

Output

     y
1  banana247
2  kiwi 138
3  banana247
4  orange 12
5  kiwi 138
6  kiwi 138
7  banana247
8  banana247
9  orange 12
10 guava 235
11 guava 235
12 banana247
13 guava 235
14 orange 12
15 banana247
16 kiwi 138
17 kiwi 138
18 banana247
19 banana247
20 orange 12

Splitting string and numerical values in column y of df2 −

Example

strsplit(df2$y,split="(?<=[a-zA-Z])\s*(?=[0-9])",perl=TRUE)

Output

[[1]]
[1] "banana" "247"
[[2]]
[1] "kiwi" "138"
[[3]]
[1] "banana" "247"
[[4]]
[1] "orange" "12"
[[5]]
[1] "kiwi" "138"
[[6]]
[1] "kiwi" "138"
[[7]]
[1] "banana" "247"
[[8]]
[1] "banana" "247"
[[9]]
[1] "orange" "12"
[[10]]
[1] "guava" "235"
[[11]]
[1] "guava" "235"
[[12]]
[1] "banana" "247"
[[13]]
[1] "guava" "235"
[[14]]
[1] "orange" "12"
[[15]]
[1] "banana" "247"
[[16]]
[1] "kiwi" "138"
[[17]]
[1] "kiwi" "138"
[[18]]
[1] "banana" "247"
[[19]]
[1] "banana" "247"
[[20]]
[1] "orange" "12"
raja
Updated on 17-Mar-2021 06:43:49

Advertisements