How to remove continuously repeated duplicates in an R data frame column?


Often values are repeated that generate duplication in the data and we might want to get rid of those values if they are not likely to create bias in the output of the analysis. For example, if we have a column that defines a process and we take the output of the process five times but it takes the same output all the time then we might want to use only one output.

Example1

 Live Demo

Consider the below data frame −

ID<−1:20
x<−sample(0:2,20,replace=TRUE)
df1<−data.frame(ID,x)
df1

Output

   ID x
1  1  1
2  2  1
3  3  0
4  4  1
5  5  0
6  6  2
7  7  1
8  8  1
9  9  1
10 10 2
11 11 2
12 12 1
13 13 2
14 14 2
15 15 0
16 16 1
17 17 2
18 18 1
19 19 1
20 20 0

Removing continuously repeated duplicates in df1 column x −

Repeated1<−cumsum(rle(as.character(df1$x))$length)
df1[Repeated1,]

Output

  ID  x
2  2  1
3  3  0
4  4  1
5  5  0
6  6  2
9  9  1
11 11 2
12 12 1
14 14 2
15 15 0
16 16 1
17 17 2
19 19 1
20 20 0

Example2

 Live Demo

ID<−1:20
y<−sample(1:5,20,replace=TRUE)
df2<−data.frame(ID,y)
df2

Output

  ID y
1 1 1
2 2 5
3 3 1
4 4 2
5 5 5
6 6 1
7 7 2
8 8 1
9 9 1
10 10 4
11 11 4
12 12 2
13 13 3
14 14 4
15 15 5
16 16 4
17 17 1
18 18 1
19 19 5
20 20 4

Removing continuously repeated duplicates in df2 column y −

Repeated2<−cumsum(rle(as.character(df2$y))$length)
df2[Repeated2,]

Output

  ID y
1 1 1
2 2 5
3 3 1
4 4 2
5 5 5
6 6 1
7 7 2
9 9 1
11 11 4
12 12 2
13 13 3
14 14 4
15 15 5
16 16 4
18 18 1
19 19 5
20 20 4

Example3

 Live Demo

ID<−1:20
z<−sample(11:13,20,replace=TRUE)
df3<−data.frame(ID,z)
df3

Output

  ID z
1 1 12
2 2 13
3 3 13
4 4 13
5 5 11
6 6 12
7 7 12
8 8 13
9 9 12
10 10 13
11 11 13
12 12 12
13 13 12
14 14 13
15 15 13
16 16 13
17 17 12
18 18 12
19 19 12
20 20 13

Removing continuously repeated duplicates in df3 column z −

Repeated3<−cumsum(rle(as.character(df3$z))$length)
df3[Repeated3,]

Output

  ID z
1 1 12
4 4 13
5 5 11
7 7 12
8 8 13
9 9 12
11 11 13
13 13 12
16 16 13
19 19 12
20 20 13

Updated on: 07-Nov-2020

126 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements