How to find the number of unique values for each column in data.table object in R?



To find the number of unique values for each column in data.table object, we can use uniqueN function along with lapply. For example, if we have a data.table object called DT that contains five columns each containing some duplicate values then the number of unique values in each of these columns can be found by using DT[,lapply(.SD,uniqueN)].

Example

Consider the below data.table object −

x1<-rpois(20,2)
x2<-rpois(20,5)
DT1<-data.table(x1,x2)
DT1

Output

   x1  x2
1:  3  11
2:  1  10
3:  3  5
4:  0  1
5:  0  7
6:  2  5
7:  2  4
8:  3  6
9:  2  4
10: 4  7
11: 1  6
12: 0  7
13: 2  5
14: 3  2
15: 2  2
16: 1  9
17: 1  2
18: 1  7
19: 2  7
20: 4  5

Finding the number of unique values in each column of DT1 −

Example

DT1[,lapply(.SD,uniqueN)]

Output

   x1 x2
1: 5  9

Example

y1<-round(rnorm(20),1)
y2<-round(rnorm(20),1)
DT2<-data.table(y1,y2)
DT2

Output

     y1    y2
1:   1.0  -0.5
2:  -1.1   0.5
3:   0.0   0.4
4:  -1.0   0.1
5:  -1.0  -1.4
6:   0.4  -0.7
7:   0.6  -0.2
8:   0.0  -0.3
9:   0.0   0.6
10: -0.2  -0.2
11: -0.2   1.8
12: 0.8    0.7
13: 0.5    0.6
14: -1.6  -0.4
15: 0.1  -0.2
16: 0.6  -1.3
17: 0.0   0.8
18: 1.4  -0.6
19: 0.5  -0.2
20: 0.9  -0.7

Finding the number of unique values in each column of DT2 −

Example

DT2[,lapply(.SD,uniqueN)]

Output

   y1 y2
1: 13 15

Example

z1<-round(runif(20,2,5),1)
z2<-round(runif(20,2,5),1)
z3<-round(runif(20,2,5),1)
z4<-round(runif(20,2,5),1)
DT3<-data.table(z1,z2,z3,z4)
DT3

Output

    z1    z2   z3    z4
1:  3.3  3.2  4.6   3.4
2:  4.1  4.4  2.9   2.7
3:  2.3  4.4  4.6   3.6
4:  5.0  3.6  2.6   2.6
5:  4.2  4.1  2.8   4.2
6:  3.7  4.4  2.9   3.1
7:  3.1  3.1  2.0   4.6
8:  4.7  2.7  3.5   5.0
9:  2.1  3.0  4.0   3.7
10: 2.3  2.5  3.2   2.7
11: 4.1  2.1  2.7   2.3
12: 2.4  2.7  4.2   3.2
13: 4.4  3.7  3.5   4.3
14: 3.7  3.1  3.3   3.3
15: 4.3  4.1  4.4   3.4
16: 3.9  2.7  2.9   3.6
17: 2.1  3.6  2.2   4.1
18: 3.0  3.6  2.3   3.4
19: 4.1  3.3  4.3   4.5
20: 2.4  3.4  3.7   3.6

Finding the number of unique values in each column of DT3 −

Example

DT3[,lapply(.SD,uniqueN)]

Output

   z1 z2 z3 z4
1: 14 12 16 15

Advertisements