How to create a regression model in R with interaction between all combinations of two variables?


The easiest way to create a regression model with interactions is inputting the variables with multiplication sign that is * but this will create many other combinations that are of higher order. If we want to create the interaction of two variables combinations then power operator can be used as shown in the below examples.

Example1

 Live Demo

x1<−rnorm(10)
x2<−rnorm(10,1,0.2)
x3<−rnorm(10,1,0.04)
y<−rnorm(10,5,1)
M1<−lm(y~(x1+x2+x3)^2)
summary(M1)
Call:
lm(formula = y ~ (x1 + x2 + x3)^2)
Residuals:
1 2 3 4 5 6 7 8
0.47052 −0.39362 0.37762 −0.80668 0.41637 −0.04845 0.00832 0.27097
9 10
0.14218 −0.43722
Coefficients:
Estimate Std. Error t value Pr(>|t|)

Output

(Intercept) 0.2893 172.6567 0.002 0.999
x1 28.5300 25.1856 1.133 0.340
x2  7.9753 191.0616 0.042 0.969
x3  3.3123 168.1906 0.020 0.986
x1:x2 1.2607 16.6937 0.076 0.945
x1:x3 −28.3810 19.4585 −1.459 0.241
x2:x3 −6.2240 186.3458 −0.033 0.975

Residual standard error: 0.7372 on 3 degrees of freedom Multiple R−squared: 0.7996, Adjusted R−squared: 0.3989 F−statistic: 1.995 on 6 and 3 DF, p−value: 0.3048

Example2

 Live Demo

a1<−rpois(500,5)
a2<−rpois(500,8)
a3<−rpois(500,10)
a4<−rpois(500,2)
a5<−rpois(500,12)
a6<−rpois(500,15)
a7<−rpois(500,9)
y<−rpois(500,1)
M2<−lm(y~(a1+a2+a3+a4+a5+a6+a7)^2)
summary(M2)
Call:
lm(formula = y ~ (a1 + a2 + a3 + a4 + a5 + a6 + a7)^2)
Residuals:
Min 1Q Median 3Q Max
−1.4849 −0.8804 −0.0342 0.6623 4.2336
Coefficients:
Estimate Std. Error t value Pr(>|t|)

Output

(Intercept) −0.1225469 1.8336636 −0.067 0.94674
a1        0.4629300 0.1548978  2.989 0.00295 **
a2       −0.0330453 0.1246535 −0.265 0.79105
a3        0.0442927 0.1191984  0.372 0.71037
a4       −0.0661164 0.2644226 −0.250 0.80266
a5        0.0657267 0.1035211  0.635 0.52579
a6       −0.0434769 0.0832513 −0.522 0.60175
a7       −0.0132370 0.1187218 −0.111 0.91127
a1:a2    −0.0055441 0.0072067 −0.769 0.44210
a1:a3    −0.0095850 0.0062517 −1.533 0.12590
a1:a4    −0.0197856 0.0156935 −1.261 0.20802
a1:a5    −0.0063698 0.0055879 −1.140 0.25489
a1:a6    −0.0119008 0.0057317 −2.076 0.03841 *
a1:a7    −0.0009957 0.0069639 −0.143 0.88637
a2:a3    −0.0005469 0.0048617 −0.112 0.91049
a2:a4    −0.0096056 0.0119358 −0.805 0.42136
a2:a5    −0.0040884 0.0048707 −0.839 0.40167
a2:a6     0.0059163 0.0045048  1.313 0.18971
a2:a7     0.0023896 0.0052308  0.457 0.64800
a3:a4    −0.0003036 0.0096746 −0.031 0.97498
a3:a5    −0.0070901 0.0045312 −1.565 0.11832
a3:a6     0.0049534 0.0039970  1.239 0.21586
a3:a7     0.0013881 0.0050959  0.272 0.78543
a4:a5     0.0138932 0.0095724  1.451 0.14734
a4:a6     0.0053824 0.0088454  0.608 0.54315
a4:a7     0.0020738 0.0107736  0.192 0.84745
a5:a6     0.0019474 0.0036433  0.535 0.59324
a5:a7     0.0019719 0.0048370  0.408 0.68370
a6:a7    −0.0031881 0.0041510 −0.768 0.44285
−−−
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.017 on 471 degrees of freedom Multiple R−squared: 0.04549, Adjusted R−squared: −0.01126 F−statistic: 0.8016 on 28 and 471 DF, p−value: 0.7563

Example3

 Live Demo

z1<−runif(100,1,2)
z2<−runif(100,1,4)
z3<−runif(100,1,5)
z4<−runif(100,2,5)
z5<−runif(100,2,10)
y<−runif(100,1,10)
M3<−lm(y~(z1+z2+z3+z4+z5)^2)
summary(M3)
Call:
lm(formula = y ~ (z1 + z2 + z3 + z4 + z5)^2)
Residuals:
Min 1Q Median 3Q Max
−5.4732 −2.0570 0.0582 2.1667 5.3376

Output

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) −2.03476 14.52311 −0.140 0.8889
z1        3.14344 6.80702  0.462 0.6454
z2        3.85518 3.05398  1.262 0.2103
z3       −1.88782 2.16124 −0.873 0.3849
z4        2.75794 3.11048  0.887 0.3778
z5       −0.70359 1.05400 −0.668 0.5063
z1:z2    −2.09623 1.24757 −1.680 0.0966 .
z1:z3     0.17328 0.97128  0.178 0.8588
z1:z4     0.53514 1.26533  0.423 0.6734
z1:z5     0.02687 0.43087  0.062 0.9504
z2:z3     0.15894 0.34335  0.463 0.6446
z2:z4    −0.72427 0.43987 −1.647 0.1034
z2:z5     0.22560 0.16570  1.362 0.1770
z3:z4    −0.16602 0.33847 −0.491 0.6251
z3:z5     0.30484 0.12536  2.432 0.0171 *
z4:z5    −0.19887 0.17768 −1.119 0.2662
−−−
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.792 on 84 degrees of freedom

Multiple R−squared: 0.1587, Adjusted R−squared: 0.008411

F−statistic: 1.056 on 15 and 84 DF, p−value: 0.4091

Updated on: 17-Oct-2020

357 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements