How to remove rows using character column that has elements of size less than 3 in an R data frame?


To find the number of characters in character vector elements or the elements in a character column of an R data frame, we can use nchar function. Therefore, if we want to remove rows that has elements of size less than 3 we would need to use the same function and then subset function will be used to remove the required rows as shown in the below examples.

Example1

Consider the below data frame −

Live Demo

> x1<-sample(c("India","UK","China"),20,replace=TRUE)
> x2<-rpois(20,2)
> df1<-data.frame(x1,x2)
> df1

Output

    x1   x2
1  India 1
2  India 2
3  UK    1
4  UK    2
5  China 1
6  India 2
7  UK    1
8  India 0
9  China 2
10 China 2
11 China 0
12 India 4
13 India 3
14 China 2
15 China 1
16 China 2
17 China 1
18 China 1
19 China 4
20 China 2

Removing rows in df1 that has character of size less than 3 −

> subset(df1,nchar(as.character(df1$x1))>=3)

Output

    x1   x2
1  India 1
2  India 2
5  China 1
6  India 2
8  India 0
9  China 2
10 China 2
11 China 0
12 India 4
13 India 3
14 China 2
15 China 1
16 China 2
17 China 1
18 China 1
19 China 4
20 China 2

Example2

Live Demo

> y1<-sample(c("Yes","No"),20,replace=TRUE)
> y2<-rnorm(20)
> df2<-data.frame(y1,y2)
> df2

Output

   y1     y2
1  No  -1.7364659
2  No   1.0939593
3  No  -0.3927835
4  No  -0.2663386
5  Yes  0.2212613
6  No  -0.4846801
7  No   2.5305836
8  No  -1.1580186
9  Yes  1.2991126
10 No  -0.2289025
11 No  -0.7304356
12 Yes  0.3648929
13 Yes -0.5454145
14 No   0.7025904
15 No  -0.1482001
16 No   0.3592025
17 Yes  1.7478691
18 No  -0.2124407
19 No   0.4227296
20 Yes -1.7340860

Removing rows in df2 that has character of size less than 3 −

> subset(df2,nchar(as.character(df2$y1))>=3)

Output

    y1     y2
5  Yes  0.2212613
9  Yes  1.2991126
12 Yes  0.3648929
13 Yes -0.5454145
17 Yes  1.7478691
20 Yes -1.7340860

Example3

Live Demo

> z1<-sample(c("Male","Female","NA"),20,replace=TRUE)
> z2<-runif(20,1,10)
> df3<-data.frame(z1,z2,z3)
> df3

Output

    z1      z2       z3
1    Male 4.194956 2.037433
2    Male 5.223558 2.252606
3    Male 2.134314 2.126866
4  Female 8.728642 1.966581
5    Male 5.105030 1.871318
6  Female 8.249922 2.250764
7    NA   5.662960 1.882002
8    Male 6.712668 1.796225
9    NA   5.421763 2.404416
10 Female 5.588083 1.571489
11   NA   7.013066 2.419949
12 Female 2.863304 1.974340
13   Male 6.677141 1.877119
14 Female 6.357583 2.390536
15   Male 5.130621 1.688357
16   Male 2.709292 1.664844
17   Male 2.421428 2.060667
18 Female 2.900265 1.391200
19   NA   5.583389 1.757949
20 Female 3.149392 1.727739

Removing rows in df3 that has character of size less than 3 −

> subset(df3,nchar(as.character(df3$z1))>=3)

Output

     z1     z2      z3
1    Male 4.194956 2.037433
2    Male 5.223558 2.252606
3    Male 2.134314 2.126866
4  Female 8.728642 1.966581
5    Male 5.105030 1.871318
6  Female 8.249922 2.250764
8    Male 6.712668 1.796225
10 Female 5.588083 1.571489
12 Female 2.863304 1.974340
13   Male 6.677141 1.877119
14 Female 6.357583 2.390536
15   Male 5.130621 1.688357
16   Male 2.709292 1.664844
17   Male 2.421428 2.060667
18 Female 2.900265 1.391200
20 Female 3.149392 1.727739

Updated on: 05-Mar-2021

384 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements