How to remove rows in an R data frame column that has duplicate values greater than or equal to a certain number of times?


To remove rows from the data frame that duplicate values greater than a certain number of times, we can create a subset for rows having duplicate values less than the certain number of times. For this purpose, we first need to extract the rows and then subset the data frame with the particular column as shown in the below examples.

Example1

Consider the below data frame −

Live Demo

> x1<-rpois(20,1)
> x2<-rpois(20,1)
> df1<-data.frame(x1,x2)
> df1

Output

   x1 x2
1  0  0
2  0  0
3  1  0
4  0  1
5  0  0
6  1  1
7  0  1
8  1  1
9  1  2
10 0  0
11 1  1
12 0  0
13 1  1
14 2  2
15 1  1
16 1  0
17 1  1
18 0  3
19 2  0
20 0  0

Removing rows based on x1 that has number of duplicate values greater than or equal to 3 −

Example

df1[df1$x1 %in% names(which(table(df1$x1)<3)),]

Output

   x1 x2
14 2  2
19 2  0

Example2

Live Demo

> y1<-rpois(20,2)
> y2<-rpois(20,2)
> y3<-rpois(20,2)
> df2<-data.frame(y1,y2,y3)
> df2

Output

   y1 y2 y3
1  2  2  1
2  1  2  0
3  1  2  3
4  3  1  4
5  2  1  1
6  2  1  2
7  1  0  1
8  0  3  5
9  6  1  3
10 2  2  2
11 0  3  0
12 2  2  3
13 3  2  0
14 2  2  4
15 1  0  1
16 1  1  2
17 3  1  3
18 2  4  1
19 0  1  2
20 0  0  0

Removing rows based on y2 that has number of duplicate values greater than or equal to 2 −

Example

> df2[df2$y2 %in% names(which(table(df2$y2)<2)),]

Output

   y1 y2 y3
18 2  4  1

Example3

Live Demo

> z1<-rpois(20,2)
> z2<-rpois(20,2)
> z3<-rpois(20,2)
> z4<-rpois(20,2)
> df3<-data.frame(z1,z2,z3,z4)
> df3

Output

   z1 z2 z3 z4
1  5  1  3  3
2  1  1  3  3
3  1  1  2  5
4  1  1  2  6
5  3  5  0  1
6  1  3  1  1
7  0  2  0  0
8  2  0  1  2
9  4  1  3  1
10 3  2  1  1
11 1  0  1  1
12 2  3  0  4
13 0  1  2  1
14 2  3  3  2
15 4  2  0  4
16 1  4  2  2
17 0  2  2  3
18 2  1  2  1
19 4  3  4  1
20 3  3  5  2

Removing rows based on z1 that has number of duplicate values greater than or equal to 2 −

Example

> df3[df3$z1 %in% names(which(table(df3$z1)<2)),]

Output

  z1 z2 z3 z4
1 5  1  3  3

Updated on: 05-Mar-2021

260 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements