To remove rows from the data frame that duplicate values greater than a certain number of times, we can create a subset for rows having duplicate values less than the certain number of times. For this purpose, we first need to extract the rows and then subset the data frame with the particular column as shown in the below examples.
Consider the below data frame −
> x1<-rpois(20,1) > x2<-rpois(20,1) > df1<-data.frame(x1,x2) > df1
x1 x2 1 0 0 2 0 0 3 1 0 4 0 1 5 0 0 6 1 1 7 0 1 8 1 1 9 1 2 10 0 0 11 1 1 12 0 0 13 1 1 14 2 2 15 1 1 16 1 0 17 1 1 18 0 3 19 2 0 20 0 0
Removing rows based on x1 that has number of duplicate values greater than or equal to 3 −
df1[df1$x1 %in% names(which(table(df1$x1)<3)),]
x1 x2 14 2 2 19 2 0
> y1<-rpois(20,2) > y2<-rpois(20,2) > y3<-rpois(20,2) > df2<-data.frame(y1,y2,y3) > df2
y1 y2 y3 1 2 2 1 2 1 2 0 3 1 2 3 4 3 1 4 5 2 1 1 6 2 1 2 7 1 0 1 8 0 3 5 9 6 1 3 10 2 2 2 11 0 3 0 12 2 2 3 13 3 2 0 14 2 2 4 15 1 0 1 16 1 1 2 17 3 1 3 18 2 4 1 19 0 1 2 20 0 0 0
Removing rows based on y2 that has number of duplicate values greater than or equal to 2 −
> df2[df2$y2 %in% names(which(table(df2$y2)<2)),]
y1 y2 y3 18 2 4 1
> z1<-rpois(20,2) > z2<-rpois(20,2) > z3<-rpois(20,2) > z4<-rpois(20,2) > df3<-data.frame(z1,z2,z3,z4) > df3
z1 z2 z3 z4 1 5 1 3 3 2 1 1 3 3 3 1 1 2 5 4 1 1 2 6 5 3 5 0 1 6 1 3 1 1 7 0 2 0 0 8 2 0 1 2 9 4 1 3 1 10 3 2 1 1 11 1 0 1 1 12 2 3 0 4 13 0 1 2 1 14 2 3 3 2 15 4 2 0 4 16 1 4 2 2 17 0 2 2 3 18 2 1 2 1 19 4 3 4 1 20 3 3 5 2
Removing rows based on z1 that has number of duplicate values greater than or equal to 2 −
> df3[df3$z1 %in% names(which(table(df3$z1)<2)),]
z1 z2 z3 z4 1 5 1 3 3