How to detect multicollinearity in categorical variables using R?


The multicollinearity is the term is related to numerical variables. It means that independent variables are linearly correlated to each other and they are numerical in nature. The categorical variables are either ordinal or nominal in nature hence we cannot say that they can be linearly correlated.

Example

Consider the below data frame −

 Live Demo

x<-sample(LETTERS[1:4],30,replace=TRUE)
y<-sample(letters[1:4],30,replace=TRUE)
response<-rnorm(30)
df<-data.frame(x,y,response)
df

Output

   x  y   response
1  C  c   0.742577646
2  C  b   0.151037885
3  A  d   0.872867986
4  D  c   1.668988206
5  C  a  -0.310929854
6  B  b  -0.582732624
7  A  a  -1.189979792
8  A  d   0.869424789
9  B  c   1.321981265
10 A  c  -0.378250113
11 B  b   1.077948111
12 D  b  -1.166599657
13 A  b   1.218434700
14 B  b  -0.938781129
15 B  a   0.393036330
16 D  a   0.031261588
17 B  c  -0.926288814
18 D  b   0.807480575
19 A  d   2.056935369
20 B  c   0.464491514
21 B  d   0.466033703
22 D  b   0.236794674
23 D  b   0.761648127
24 C  b  -0.438568617
25 D  c  -1.806599022
26 B  c   0.885648179
27 A  b  -0.830359221
28 A  b   0.545703187
29 D  d   0.007146744
30 C  a  -0.243890913

Have a look at the categorical columns and think about how we can find correlation between those columns.

Updated on: 06-Mar-2021

961 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements