How to extract the first digit from a character column in an R data frame?


If we have a character column in the data frame that contains string as well as numeric values and the first digit of the numeric values has some meaning that can help in data analysis then we can extract those first digits. For this purpose, we can use stri_extract_first function from stringi package.

Example1

Consider the below data frame −

Live Demo

> x1<-1:20
> y1<-sample(c("HT23L","HT14L","HT32L"),20,replace=TRUE)
> df1<-data.frame(x1,y1)
> df1

Output

   x1  y1
1  1  HT14L
2  2  HT14L
3  3  HT23L
4  4  HT14L
5  5  HT32L
6  6  HT32L
7  7  HT14L
8  8  HT32L
9  9  HT32L
10 10 HT32L
11 11 HT23L
12 12 HT32L
13 13 HT14L
14 14 HT23L
15 15 HT14L
16 16 HT23L
17 17 HT23L
18 18 HT23L
19 19 HT23L
20 20 HT23L

Loading stringi package and extracting first digit in column y1 −

> library(stringi)
> stri_extract_first(df1$y1,regex="\d")

Output

[1] "1" "1" "2" "1" "3" "3" "1" "3" "3" "3" "2" "3" "1" "2" "1" "2" "2" "2" "2"
[20] "2"

Example2

Live Demo

> x2<-sample(c("India1RT1","UK5RT1","Egypt2PT4"),20,replace=TRUE)
> y2<-rpois(20,5)
> df2<-data.frame(x2,y2)
> df2

Output

     x2      y2
1  India1RT1 2
2  India1RT1 8
3  India1RT1 7
4  India1RT1 6
5  UK5RT1    6
6  India1RT1 5
7  UK5RT1    6
8  India1RT1 6
9  India1RT1 7
10 UK5RT1    10
11 Egypt2PT4 8
12 Egypt2PT4 5
13 Egypt2PT4 7
14 India1RT1 2
15 UK5RT1    3
16 Egypt2PT4 5
17 UK5RT1    3
18 Egypt2PT4 6
19 Egypt2PT4 3
20 UK5RT1    5

Extracting first digit in column x2 −

> stri_extract_first(df2$x2,regex="\d")

Output

[1] "1" "1" "1" "1" "5" "1" "5" "1" "1" "5" "2" "2" "2" "1" "5" "2" "5" "2" "2"
[20] "5"

Example3

Live Demo

> x3<-sample(c("abc123","dfe456"),20,replace=TRUE)
> y3<-rnorm(20)
> df3<-data.frame(x3,y3)
> df3

Output

     x3      y3
1  abc123  0.1027005
2  dfe456  0.2297002
3  dfe456 -0.1441151
4  dfe456  1.0510760
5  abc123  0.8182656
6  dfe456 -0.5018968
7  dfe456  0.2957634
8  abc123 -0.4240910
9  dfe456 -1.0700713
10 dfe456 -0.3374661
11 dfe456 -0.4654241
12 dfe456 -0.4542710
13 abc123  0.6969808
14 dfe456 -0.6514574
15 abc123  0.2258769
16 dfe456 -0.5348958
17 abc123  0.6629195
18 dfe456  1.0998636
19 dfe456 -1.3147809
20 dfe456 -2.3015384

Extracting first digit in column x3 −

> stri_extract_first(df3$x3,regex="\d")

Output

[1] "1" "4" "4" "4" "1" "4" "4" "1" "4" "4" "4" "4" "1" "4" "1" "4" "1" "4" "4"
[20] "4"

Updated on: 05-Mar-2021

688 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements