How to remove continuously repeated duplicates in an R data frame column?

R Programming Server Side Programming Programming

Often values are repeated that generate duplication in the data and we might want to get rid of those values if they are not likely to create bias in the output of the analysis. For example, if we have a column that defines a process and we take the output of the process five times but it takes the same output all the time then we might want to use only one output.

Example1

Live Demo

Consider the below data frame −

ID<−1:20
x<−sample(0:2,20,replace=TRUE)
df1<−data.frame(ID,x)
df1

Output

Removing continuously repeated duplicates in df1 column x −

Repeated1<−cumsum(rle(as.character(df1$x))$length)
df1[Repeated1,]

Output

Example2

Live Demo

ID<−1:20
y<−sample(1:5,20,replace=TRUE)
df2<−data.frame(ID,y)
df2

Output

Removing continuously repeated duplicates in df2 column y −

Repeated2<−cumsum(rle(as.character(df2$y))$length)
df2[Repeated2,]

Output

Example3

Live Demo

ID<−1:20
z<−sample(11:13,20,replace=TRUE)
df3<−data.frame(ID,z)
df3

Output

Removing continuously repeated duplicates in df3 column z −

Repeated3<−cumsum(rle(as.character(df3$z))$length)
df3[Repeated3,]

Output

Nizamuddin Siddiqui

Updated on: 2020-11-07T11:35:37+05:30

245 Views

Kickstart Your Career

Get certified by completing the course

Get Started