Find the combination of columns for correlation coefficient greater than a certain value in R


To find the combination of columns for correlation coefficient greater than a certain value, we would first need to create the correlation matrix and then melt the correlation with the help if melt function of reshape2 package. After that subset of the output will be taken based on the value of the correlation coefficient.

Check out the below examples to understand how it works.

Example 1

Following snippet creates a sample data frame −

x1<-rpois(20,1)
x2<-rpois(20,10)
x3<-rpois(20,5)
x4<-rpois(20,2)
df1<-data.frame(x1,x2,x3,x4)
df1

The following dataframe is created −

   x1 x2 x3 x4
1  1  8  3  2
2  2  5  6  1
3  0  3  3  2
4  1  8  4  3
5  2  9  7  0
6  0  9  5  1
7  0 13  6  5
8  3  9  2  2
9  0 11  5  3
10 1 11  6  2
11 0 15  0  2
12 1  6  7  3
13 0  9  4  1
14 1 11  6  1
15 1  5  6  2
16 0  8  5  1
17 2  9  5  1
18 0 17  5  4
19 1  7  6  0
20 1 12  6  4

To create correlation matrix of df1, add the following code to the above snippet −

x1<-rpois(20,1)
x2<-rpois(20,10)
x3<-rpois(20,5)
x4<-rpois(20,2)
df1<-data.frame(x1,x2,x3,x4)
cor1_matrix<-cor(df1)
cor1_matrix

Output

If you execute all the above given snippets as a single program, it generates the following output −

       x1          x2        x3           x4
x1   1.0000000  -0.2873806  0.12162796 -0.31472199
x2  -0.2873806   1.0000000 -0.16970821  0.45119129
x3   0.1216280  -0.1697082  1.00000000 -0.02241285
x4  -0.3147220   0.4511913 -0.02241285  1.00000000

To load reshape2 package and find the combination of correlation coefficients that are greater than 0.30, add the following code to the above snippet −

library(reshape2)
subset(melt(cor1_matrix),value>.30)

Output

If you execute all the above given snippets as a single program, it generates the following output −

   Var1 Var2 value
1  x1   x1  1.0000000
6  x2   x2  1.0000000
8  x4   x2  0.4511913
11 x3   x3  1.0000000
14 x2   x4  0.4511913
16 x4   x4  1.0000000

Example 2

Following snippet creates a sample data frame −

y1<-rnorm(20)
y2<-rnorm(20,5)
y3<-rnorm(20,1.005)
df2<-data.frame(y1,y2,y3)
df2

The following dataframe is created −

        y1        y2        y3
1   0.987216392 5.729841   1.6302391
2   0.784426157 4.229493   1.3783138
3  -0.444098876 3.623398   1.7947024
4   0.093496185 5.388854   0.7357072
5  -0.606812484 4.608422   1.5531116
6   0.681756392 4.502711   1.7351390
7   0.646009220 5.414941   1.4273596
8   0.418220626 6.227583  -0.4851824
9  -0.096372689 5.749269  -0.3193480
10  0.263341182 4.861265   1.8186878
11 -0.669565407 5.292873   1.4790937
12 -0.409141117 6.087335   1.8738509
13 -0.008184681 4.887777   1.8336940
14  1.147759554 5.431373  -0.5929404
15 -0.826403622 5.043522   0.3473174
16 -1.749526916 4.274688   0.4565382
17 -0.981464558 5.652843   2.0842843
18  1.414818984 5.136481   1.3521429
19  1.010931968 5.266047   1.7779003
20  0.674112034 5.497107   0.8404535

To create correlation matrix, add the following code to the above snippet −

y1<-rnorm(20)
y2<-rnorm(20,5)
y3<-rnorm(20,1.005)
df2<-data.frame(y1,y2,y3)
cor2_matrix<-cor(df2)
cor2_matrix

Output

If you execute all the above given snippets as a single program, it generates the following output −

        y1         y2          y3
y1  1.00000000  0.2162542 -0.03940615
y2  0.21625418  1.0000000 -0.30541902
y3 -0.03940615 -0.3054190  1.00000000

To find the combination of correlation coefficients that are less than 0.20, add the following code to the above snippet −

subset(melt(cor2_matrix),value<0.20)

Output

If you execute all the above given snippets as a single program, it generates the following output −

   Var1 Var2 value
3  y3   y1  -0.03940615
6  y3   y2  -0.30541902
7  y1   y3  -0.03940615
8  y2   y3  -0.30541902

Updated on: 12-Nov-2021

488 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements