How to find the two factor interaction variables in an R data frame?


If we have a data frame called df that contains four columns say x, y, z, and a then the two factor interaction columns will be xy, xz, xa, yz, ya, za. To find how many two factor interaction variables can be created using data frame columns, we can make use of combn function as shown in the below examples.

Consider the below data frame −

Example

 Live Demo

x1<-rpois(20,2)
x2<-rpois(20,2)
x3<-rpois(20,1)
x4<-rpois(20,2)
x5<-rpois(20,5)
x6<-rpois(20,2)
df1<-data.frame(x1,x2,x3,x4,x5,x6)
df1

Output

  x1 x2 x3 x4 x5 x6
1 3  1  1   2 5  0
2 1  2  3   4 6  0
3 3  2  1  4  5  1
4 1  2  0  2  3  3
5 0  0  2  1  4  3
6 4  1  0  8  3  0
7 3  2  1  0  8  3
8 3  2  1  2  6  3
9 4  4  0  1  5  0
10 1 1  1  3  3  2
11 3 2  0  4  3  1
12 0 0  2  1  4  2
13 4 4  0  2  3  3
14 2 3  0  3  3  1
15 1 4  3  1  8  2
16 2 3  1  1  4  2
17 2 3  0  2  4  3
18 2 5  1  1  10 3
19 0 2  0  1  9  3
20 0 3  0  1  4  2

Finding two factor interaction variables in df1 −

combn(colnames(df1),2,FUN=paste,collapse='_')

[1] "x1_x2" "x1_x3" "x1_x4" "x1_x5" "x1_x6" "x2_x3" "x2_x4" "x2_x5" "x2_x6"
[10] "x3_x4" "x3_x5" "x3_x6" "x4_x5" "x4_x6" "x5_x6"

Example

 Live Demo

y1<-round(rnorm(20),2)
y2<-round(rnorm(20),2)
y3<-round(rnorm(20),2)
y4<-round(rnorm(20),2)
y5<-round(rnorm(20),2)
y6<-round(rnorm(20),2)
df2<-data.frame(y1,y2,y3,y4,y5,y6)
df2

Output

     y1      y2     y3      y4       y5       y6
1    0.37  -0.25  -2.60    1.56    -0.64    -0.80
2    0.68   0.65   2.06   -0.54     0.16    -0.22
3    0.51  -0.37   0.16   -2.23    -0.42     0.52
4   -0.01  -0.32   1.65   -2.59     1.01    -1.86
5   -0.65  -0.56  -0.41   -0.88     0.50    -0.66
6   -0.42   0.55   0.26    0.02    -1.52    -0.34
7   -0.89  -0.91  -1.28    0.26    -1.27    -1.04
8    0.12   0.59  -0.80   -1.24     1.57    -0.53
9   -0.26  -1.09   0.65   -0.40     0.18     0.16
10  -1.10  -0.70   2.30    0.31    -0.46    -0.16
11  -0.42  -0.06  -0.76    0.45     0.28    -0.10
12  -0.07   2.08  -0.17   -0.16    -0.54     2.06
13  -0.91   0.37  -1.19   -2.44    -0.45     0.46
14   0.74   1.06   0.42    0.85    -0.12    -0.21
15   1.51   0.29  -0.14    0.28     0.76    -0.45
16   0.11  -0.66  -1.70    1.88    -1.16     1.05
17   0.49   0.44  -1.38   -0.39    -1.47    -1.12
18   0.67  -0.29   1.40    0.80    -0.25     1.23
19   0.45   1.57   1.34    1.75     0.25    -0.89
20   1.05   0.23  -0.06   -0.29     1.50     1.20

Finding two factor interaction variables in df2 −

combn(colnames(df2),2,FUN=paste,collapse='_')

[1] "y1_y2" "y1_y3" "y1_y4" "y1_y5" "y1_y6" "y2_y3" "y2_y4" "y2_y5" "y2_y6"
[10] "y3_y4" "y3_y5" "y3_y6" "y4_y5" "y4_y6" "y5_y6"

Updated on: 06-Feb-2021

172 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements