- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to remove continuously repeated duplicates in an R data frame column?
Often values are repeated that generate duplication in the data and we might want to get rid of those values if they are not likely to create bias in the output of the analysis. For example, if we have a column that defines a process and we take the output of the process five times but it takes the same output all the time then we might want to use only one output.
Example1
Consider the below data frame −
ID<−1:20 x<−sample(0:2,20,replace=TRUE) df1<−data.frame(ID,x) df1
Output
ID x 1 1 1 2 2 1 3 3 0 4 4 1 5 5 0 6 6 2 7 7 1 8 8 1 9 9 1 10 10 2 11 11 2 12 12 1 13 13 2 14 14 2 15 15 0 16 16 1 17 17 2 18 18 1 19 19 1 20 20 0
Removing continuously repeated duplicates in df1 column x −
Repeated1<−cumsum(rle(as.character(df1$x))$length) df1[Repeated1,]
Output
ID x 2 2 1 3 3 0 4 4 1 5 5 0 6 6 2 9 9 1 11 11 2 12 12 1 14 14 2 15 15 0 16 16 1 17 17 2 19 19 1 20 20 0
Example2
ID<−1:20 y<−sample(1:5,20,replace=TRUE) df2<−data.frame(ID,y) df2
Output
ID y 1 1 1 2 2 5 3 3 1 4 4 2 5 5 5 6 6 1 7 7 2 8 8 1 9 9 1 10 10 4 11 11 4 12 12 2 13 13 3 14 14 4 15 15 5 16 16 4 17 17 1 18 18 1 19 19 5 20 20 4
Removing continuously repeated duplicates in df2 column y −
Repeated2<−cumsum(rle(as.character(df2$y))$length) df2[Repeated2,]
Output
ID y 1 1 1 2 2 5 3 3 1 4 4 2 5 5 5 6 6 1 7 7 2 9 9 1 11 11 4 12 12 2 13 13 3 14 14 4 15 15 5 16 16 4 18 18 1 19 19 5 20 20 4
Example3
ID<−1:20 z<−sample(11:13,20,replace=TRUE) df3<−data.frame(ID,z) df3
Output
ID z 1 1 12 2 2 13 3 3 13 4 4 13 5 5 11 6 6 12 7 7 12 8 8 13 9 9 12 10 10 13 11 11 13 12 12 12 13 13 12 14 14 13 15 15 13 16 16 13 17 17 12 18 18 12 19 19 12 20 20 13
Removing continuously repeated duplicates in df3 column z −
Repeated3<−cumsum(rle(as.character(df3$z))$length) df3[Repeated3,]
Output
ID z 1 1 12 4 4 13 5 5 11 7 7 12 8 8 13 9 9 12 11 11 13 13 13 12 16 16 13 19 19 12 20 20 13
Advertisements