- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to remove rows in an R data frame column that has duplicate values greater than or equal to a certain number of times?
To remove rows from the data frame that duplicate values greater than a certain number of times, we can create a subset for rows having duplicate values less than the certain number of times. For this purpose, we first need to extract the rows and then subset the data frame with the particular column as shown in the below examples.
Example1
Consider the below data frame −
> x1<-rpois(20,1) > x2<-rpois(20,1) > df1<-data.frame(x1,x2) > df1
Output
x1 x2 1 0 0 2 0 0 3 1 0 4 0 1 5 0 0 6 1 1 7 0 1 8 1 1 9 1 2 10 0 0 11 1 1 12 0 0 13 1 1 14 2 2 15 1 1 16 1 0 17 1 1 18 0 3 19 2 0 20 0 0
Removing rows based on x1 that has number of duplicate values greater than or equal to 3 −
Example
df1[df1$x1 %in% names(which(table(df1$x1)<3)),]
Output
x1 x2 14 2 2 19 2 0
Example2
> y1<-rpois(20,2) > y2<-rpois(20,2) > y3<-rpois(20,2) > df2<-data.frame(y1,y2,y3) > df2
Output
y1 y2 y3 1 2 2 1 2 1 2 0 3 1 2 3 4 3 1 4 5 2 1 1 6 2 1 2 7 1 0 1 8 0 3 5 9 6 1 3 10 2 2 2 11 0 3 0 12 2 2 3 13 3 2 0 14 2 2 4 15 1 0 1 16 1 1 2 17 3 1 3 18 2 4 1 19 0 1 2 20 0 0 0
Removing rows based on y2 that has number of duplicate values greater than or equal to 2 −
Example
> df2[df2$y2 %in% names(which(table(df2$y2)<2)),]
Output
y1 y2 y3 18 2 4 1
Example3
> z1<-rpois(20,2) > z2<-rpois(20,2) > z3<-rpois(20,2) > z4<-rpois(20,2) > df3<-data.frame(z1,z2,z3,z4) > df3
Output
z1 z2 z3 z4 1 5 1 3 3 2 1 1 3 3 3 1 1 2 5 4 1 1 2 6 5 3 5 0 1 6 1 3 1 1 7 0 2 0 0 8 2 0 1 2 9 4 1 3 1 10 3 2 1 1 11 1 0 1 1 12 2 3 0 4 13 0 1 2 1 14 2 3 3 2 15 4 2 0 4 16 1 4 2 2 17 0 2 2 3 18 2 1 2 1 19 4 3 4 1 20 3 3 5 2
Removing rows based on z1 that has number of duplicate values greater than or equal to 2 −
Example
> df3[df3$z1 %in% names(which(table(df3$z1)<2)),]
Output
z1 z2 z3 z4 1 5 1 3 3
- Related Articles
- How to remove rows that contains NA values in certain columns of an R data frame?
- How to subset rows of an R data frame if any columns have values greater than a certain value?
- How to subset rows of an R data frame if all columns have values greater than a certain value
- How to remove rows using character column that has elements of size less than 3 in an R data frame?
- How to select values less than or greater than a specific percentile from an R data frame column?
- How to subset rows of an R data frame based on duplicate values in a particular column?
- How to remove duplicate rows and sort based on a numerical column an R data frame?
- How to find the frequency of values greater than or equal to a certain value in R?
- How to count the number of duplicate rows in an R data frame?
- How to remove rows from an R data frame based on frequency of values in grouping column?
- How to remove rows in R data frame that contains a specific number?
- How to convert values greater than a threshold into 1 in R data frame column?
- How to subset non-duplicate values from an R data frame column?
- How to find the count of duplicate rows if they are greater than n in R data frame?
- How to remove rows for categorical columns that has three or less combination of duplicates in an R data frame?

Advertisements