Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
How to remove rows in an R data frame column that has duplicate values greater than or equal to a certain number of times?
To remove rows from the data frame that duplicate values greater than a certain number of times, we can create a subset for rows having duplicate values less than the certain number of times. For this purpose, we first need to extract the rows and then subset the data frame with the particular column as shown in the below examples.
Example1
Consider the below data frame −
> x1<-rpois(20,1) > x2<-rpois(20,1) > df1<-data.frame(x1,x2) > df1
Output
x1 x2 1 0 0 2 0 0 3 1 0 4 0 1 5 0 0 6 1 1 7 0 1 8 1 1 9 1 2 10 0 0 11 1 1 12 0 0 13 1 1 14 2 2 15 1 1 16 1 0 17 1 1 18 0 3 19 2 0 20 0 0
Removing rows based on x1 that has number of duplicate values greater than or equal to 3 −
Example
df1[df1$x1 %in% names(which(table(df1$x1)<3)),]
Output
x1 x2 14 2 2 19 2 0
Example2
> y1<-rpois(20,2) > y2<-rpois(20,2) > y3<-rpois(20,2) > df2<-data.frame(y1,y2,y3) > df2
Output
y1 y2 y3 1 2 2 1 2 1 2 0 3 1 2 3 4 3 1 4 5 2 1 1 6 2 1 2 7 1 0 1 8 0 3 5 9 6 1 3 10 2 2 2 11 0 3 0 12 2 2 3 13 3 2 0 14 2 2 4 15 1 0 1 16 1 1 2 17 3 1 3 18 2 4 1 19 0 1 2 20 0 0 0
Removing rows based on y2 that has number of duplicate values greater than or equal to 2 −
Example
> df2[df2$y2 %in% names(which(table(df2$y2)<2)),]
Output
y1 y2 y3 18 2 4 1
Example3
> z1<-rpois(20,2) > z2<-rpois(20,2) > z3<-rpois(20,2) > z4<-rpois(20,2) > df3<-data.frame(z1,z2,z3,z4) > df3
Output
z1 z2 z3 z4 1 5 1 3 3 2 1 1 3 3 3 1 1 2 5 4 1 1 2 6 5 3 5 0 1 6 1 3 1 1 7 0 2 0 0 8 2 0 1 2 9 4 1 3 1 10 3 2 1 1 11 1 0 1 1 12 2 3 0 4 13 0 1 2 1 14 2 3 3 2 15 4 2 0 4 16 1 4 2 2 17 0 2 2 3 18 2 1 2 1 19 4 3 4 1 20 3 3 5 2
Removing rows based on z1 that has number of duplicate values greater than or equal to 2 −
Example
> df3[df3$z1 %in% names(which(table(df3$z1)<2)),]
Output
z1 z2 z3 z4 1 5 1 3 3
Advertisements
