- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to remove continuously repeated duplicates in an R data frame column?
Often values are repeated that generate duplication in the data and we might want to get rid of those values if they are not likely to create bias in the output of the analysis. For example, if we have a column that defines a process and we take the output of the process five times but it takes the same output all the time then we might want to use only one output.
Example1
Consider the below data frame −
ID<−1:20 x<−sample(0:2,20,replace=TRUE) df1<−data.frame(ID,x) df1
Output
ID x 1 1 1 2 2 1 3 3 0 4 4 1 5 5 0 6 6 2 7 7 1 8 8 1 9 9 1 10 10 2 11 11 2 12 12 1 13 13 2 14 14 2 15 15 0 16 16 1 17 17 2 18 18 1 19 19 1 20 20 0
Removing continuously repeated duplicates in df1 column x −
Repeated1<−cumsum(rle(as.character(df1$x))$length) df1[Repeated1,]
Output
ID x 2 2 1 3 3 0 4 4 1 5 5 0 6 6 2 9 9 1 11 11 2 12 12 1 14 14 2 15 15 0 16 16 1 17 17 2 19 19 1 20 20 0
Example2
ID<−1:20 y<−sample(1:5,20,replace=TRUE) df2<−data.frame(ID,y) df2
Output
ID y 1 1 1 2 2 5 3 3 1 4 4 2 5 5 5 6 6 1 7 7 2 8 8 1 9 9 1 10 10 4 11 11 4 12 12 2 13 13 3 14 14 4 15 15 5 16 16 4 17 17 1 18 18 1 19 19 5 20 20 4
Removing continuously repeated duplicates in df2 column y −
Repeated2<−cumsum(rle(as.character(df2$y))$length) df2[Repeated2,]
Output
ID y 1 1 1 2 2 5 3 3 1 4 4 2 5 5 5 6 6 1 7 7 2 9 9 1 11 11 4 12 12 2 13 13 3 14 14 4 15 15 5 16 16 4 18 18 1 19 19 5 20 20 4
Example3
ID<−1:20 z<−sample(11:13,20,replace=TRUE) df3<−data.frame(ID,z) df3
Output
ID z 1 1 12 2 2 13 3 3 13 4 4 13 5 5 11 6 6 12 7 7 12 8 8 13 9 9 12 10 10 13 11 11 13 12 12 12 13 13 12 14 14 13 15 15 13 16 16 13 17 17 12 18 18 12 19 19 12 20 20 13
Removing continuously repeated duplicates in df3 column z −
Repeated3<−cumsum(rle(as.character(df3$z))$length) df3[Repeated3,]
Output
ID z 1 1 12 4 4 13 5 5 11 7 7 12 8 8 13 9 9 12 11 11 13 13 13 12 16 16 13 19 19 12 20 20 13
- Related Articles
- How to find the frequency of continuously repeated string values in an R data frame column?
- How to remove repeated numbers in sequence in R data frame column?
- How to remove duplicates in series from each row in an R data frame?
- How to remove a column from an R data frame?
- How to remove column names from an R data frame?
- How to remove a character in an R data frame column?
- How to remove underscore from column names of an R data frame?
- How to remove single quote from string column in an R data frame?
- How to remove a common suffix from column names in an R data frame?
- How to create a data frame with a column having repeated values in R?
- How to remove first character from column name in R data frame?
- How to standardized a column in an R data frame?
- How to remove empty rows from an R data frame?
- How to remove rows for categorical columns that has three or less combination of duplicates in an R data frame?
- How to remove duplicate rows and sort based on a numerical column an R data frame?

Advertisements