# How to create a correlation matrix by a categorical column in data.table object in R?

To create a correlation matrix by a categorical column in data.table object in R, we can follow the below steps −

• First of all, create a data.table object.
• Then, find the correlation matrix by splitting the object based on categorical column.

## Create the data.table object

library(data.table)
x<-sample(1:50,25)
y<-sample(1:50,25)
z<-sample(1:50,25)
Group<-sample(LETTERS[1:4],25,replace=TRUE)
DT<-data.table(x,y,z,Group)
DT

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

    x   y z  Group
1:  15  7 3  C
2:  11  1 31 B
3:  2  16 15 A
4:  33 42 49 B
5:  27 39 19 A
6:  16 21 1  D
7:  18 22 37 A
8:  42 6  35 D
9:  30 40 38 D
10: 8  17 26 C
11: 34 10 41 B
12: 47 33 13 C
13: 7  5  8  A
14: 26 26 43 D
15: 3  41 24 D
16: 31 23 9  B
17: 40 27 32 B
18: 25 30 21 A
19: 5  8  47 D
20: 6  49 17 A
21: 46 3  34 C
22: 21 38 48 A
23: 48 50 4  A
24: 19 36 36 B
25: 39 4  50 C
x  y z Group

## Create the correlation matrix by a categorical column

Using split function with lapply to create the correlation matrix for the data in DT by Group column −

library(data.table)
x<-sample(1:50,25)
y<-sample(1:50,25)
z<-sample(1:50,25)
Group<-sample(LETTERS[1:4],25,replace=TRUE)
DT<-data.table(x,y,z,Group)
lapply(split(DT[,1:3],DT$Group),cor) ## Output $A
x          y          z
x 1.0000000 0.58114264 -0.10587701
y 0.5811426 1.00000000 0.03787179
z -0.1058770 0.03787179 1.00000000

$B x y z x 1.00000000 0.3848310 0.06995891 y 0.38483099 1.0000000 0.26711858 z 0.06995891 0.2671186 1.00000000$C
x          y       z
x 1.0000000 0.1111837 0.3542485
y 0.1111837 1.0000000 -0.4556856
z 0.3542485 -0.4556856 1.0000000

\$D
x          y          z
x 1.0000000 -0.2276999 0.1791530
y -0.2276999 1.0000000 -0.1737857
z 0.1791530 -0.1737857 1.0000000

Updated on: 14-Aug-2021

1K+ Views