How to find the number of characters in each row of a string column in R?


If we have a string column in an R data frame and the strings are mixed with numbers and we want to find the number of characters in each row of the string column then nchar function can be used with gsub function as shown in the below examples.

Since R is case sensitive, we need to make sure that we use correct notation for small and upper-case letters while doing this type of analysis.

Example 1

Following snippet creates a sample data frame −

x<-c("A01K", "140AL", "A142R", "A255SW", "A2474EZ", "CA214N", "C14O", "CGSLT", "DC23QW", "D2411RWEDE", "FL233EGV", "G36521VCLPBA", "G54TRU", "H214FI", "245IA", "ID3699", "IL01", "IFDFDN", "K2254FDES", "KY244RLPKJ")
df1<-data.frame(x)
df1

The following dataframe is created −

     x
1  A01K
2  140AL
3  A142R
4  A255SW
5  A2474EZ
6  CA214N
7  C14O
8  CGSLT
9  DC23QW
10 D2411RWEDE
11 FL233EGV
12 G36521VCLPBA
13 G54TRU
14 H214FI
15 245IA
16 ID3699
17 IL01
18 IFDFDN
19 K2254FDES
20 KY244RLPKJ

To find the number of characters in each row of column x, add the following code to the above snippet −

x<-c("A01K", "140AL", "A142R", "A255SW", "A2474EZ", "CA214N", "C14O", "CGSLT", "DC23QW", "D2411RWEDE", "FL233EGV", "G36521VCLPBA", "G54TRU", "H214FI", "245IA", "ID3699", "IL01", "IFDFDN", "K2254FDES", "KY244RLPKJ")
df1<-data.frame(x)
df1$No_of_Chars<-nchar(gsub("[^A-Z]","",df1$x))
df1

Output

If you execute all the above given snippets as a single program, it generates the following output −

    x    No_of_Chars
1  A01K         2
2  140AL        2
3  A142R        2
4  A255SW       3
5  A2474EZ      3
6  CA214N       3
7  C14O         2
8  CGSLT        5
9  DC23QW       4
10 D2411RWEDE   6
11 FL233EGV     5
12 G36521VCLPBA 7
13 G54TRU       4
14 H214FI       3
15 245IA        2
16 ID3699       2
17 IL01         2
18 IFDFDN       6
19 K2254FDES    5
20 KY244RLPKJ   7

Example 2

Following snippet creates a sample data frame −

y<-c("ala5412bama","ala1475ska","american11022samoa","arizona3652","arkan1475sas","califor2365nia","co1475lorado","0014connecticut","dela25366ware","district257of22columbia","florid02535a","57412georgia","gu25987am","hawaii36250","20057idaho","i369852llinois","indiana0146563","3255iowa","kansas3682701","kentucky2574")
df2<-data.frame(y)
df2

The following dataframe is created −

      y
1  ala5412bama
2  ala1475ska
3  american11022samoa
4  arizona3652
5  arkan1475sas
6  califor2365nia
7  co1475lorado
8  0014connecticut
9  dela25366ware
10 district257of22columbia
11 florid02535a
12 57412georgia
13 gu25987am
14 hawaii36250
15 20057idaho
16 i369852llinois
17 indiana0146563
18 3255iowa
19 kansas3682701
20 kentucky2574

To find the number of characters in each row of column y, add the following code to the above snippet −

y<-c("ala5412bama","ala1475ska","american11022samoa","arizona3652","arkan1475sas","califor2365nia","co1475lorado","0014connecticut","dela25366ware","district257of22columbia","florid02535a","57412georgia","gu25987am","hawaii36250","20057idaho","i369852llinois","indiana0146563","3255iowa","kansas3682701","kentucky2574")
df2<-data.frame(y)
df2$No_of_Chars<-nchar(gsub("[^a-z]","",df2$y))
df2

Output

If you execute all the above given snippets as a single program, it generates the following output −

          y          No_of_Chars
1  ala5412bama              7
2  ala1475ska               6
3  american11022samoa      13
4  arizona3652              7
5  arkan1475sas             8
6  califor2365nia          10
7  co1475lorado             8
8  0014connecticut         11
9  dela25366ware            8
10 district257of22columbia 18
11 florid02535a             7
12 57412georgia             7
13 gu25987am                4
14 hawaii36250              6
15 20057idaho               5
16 i369852llinois           8
17 indiana0146563           7
18 3255iowa                 4
19 kansas3682701            6
20 kentucky2574             8

Updated on: 11-Nov-2021

905 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements