How to check for duplicates in data.table object in R?


The checking for duplicates in a data.table object can be easily done with the help of delta ($) operator that is used to access the columns and the duplicated function. For example, if a data.table object DT contains a column x then to check the duplicates in x, we can use the command duplicated(DT$x).

Example1

Loading data.table object and creating an object −

library(data.table)
set.seed(141)
x<−rpois(20,5)
DT1<−data.table(x)
DT1

Output

x
1: 6
2: 3
3: 5
4: 5
5: 5
6: 5
7: 3
8: 4
9: 6
10: 7
11: 3
12: 4
13: 3
14: 5
15: 4
16: 6
17: 6
18: 4
19: 4
20: 10

Checking for duplicates in x −

Example

duplicated(DT1$x)

Output

[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
[13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE

Example2

y<−round(rnorm(20,5,2),0)
DT2<−data.table(y)
DT2

Output

y
1: 4
2: 3
3: 8
4: 9
5: 4
6: 4
7: 6
8: 5
9: 3
10: 5
11: 3
12: 5
13: 9
14: 8
15: 6
16: 4
17: 2
18: 6
19: 4
20: 5

Checking for duplicates in y −

Example

duplicated(DT2$y)

Output

[1] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
[13] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE

Example3

z<−round(runif(20,2,5),0)
DT3<−data.table(z)
DT3

Output

   z
1:  4
2:  2
3:  2
4:  4
5:  4
6:  4
7: 4
8: 4
9:  3
10: 3
11: 5
12: 5
13: 2
14: 2
15: 4
16: 5
17: 3
18: 3
19: 2
20: 5

Checking for duplicates in z −

Example

duplicated(DT3$z)

Output

[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE
[13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Updated on: 08-Feb-2021

419 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements