How to find the sequence of correlation between variables in an R data frame or matrix?


To find the sequence of correlation between variables in an R data frame or matrix, we can use correlate and stretch function from corrr package.

For example, if we have a data frame called df then we can find the sequence of correlation between variables in df by using the below given command −

df%>%
correlate() %>%
stretch() %>%
arrange(r)

Example 1

Following snippet creates a sample data frame −

x1<-rnorm(20)
x2<-rnorm(20,5,0.2)
x3<-rnorm(20,5,0.005)
x4<-rnorm(20,2,0.14)
df1<-data.frame(x1,x2,x3,x4)
df1

The following dataframe is created −

       x1         x2       x3      x4
1   0.6808273  5.100547 5.008409 1.903267
2  -0.8982181  5.236966 5.003451 1.885443
3   1.7004141  4.938643 4.998515 2.078969
4  -0.5556164  5.086777 5.000830 1.836611
5  -0.3700080  5.180180 4.999621 2.066981
6  -0.6171065  5.218265 5.006482 2.012696
7   1.3167304  4.886068 5.005309 1.697461
8  -1.0291898  5.082370 4.996527 1.876241
9   0.3552824  5.354599 5.004550 2.010306
10 -0.8318924  4.943031 4.999007 2.097666
11 -0.3643198  5.287603 4.996394 2.119303
12 -0.6422962  4.864442 4.998594 2.105324
13  3.3631619  4.675183 4.999328 1.878981
14  1.7794186  4.769273 4.997484 2.027005
15 -0.5102582  5.140516 5.001077 1.830695
16  1.1652416  4.586822 4.996408 2.101790
17 -0.4535449  5.046426 5.004014 1.906526
18 -2.0166857  4.666686 4.996425 2.011478
19 -0.9543124  4.956333 5.002519 1.984997
20  1.5101443  5.273918 4.988374 1.876615

To load corrr package and find the sequence of correlation between variables in df1, add the following code to the above snippet −

library(corrr)
df1%>%
+ correlate() %>%
+ stretch() %>%
+ arrange(r)

Correlation method: 'pearson'
Missing treated using: 'pairwise.complete.obs'

# A tibble: 16 x 3

Output

If you execute all the above given snippets as a single program, it generates the following output −

    x     y    r
 <chr> <chr> <dbl>
1  x1   x2  -0.284
2  x2   x1  -0.284
3  x3   x4  -0.263
4  x4   x3  -0.263
5  x1   x4  -0.155
6  x4   x1  -0.155
7  x1   x3  -0.137
8  x3   x1  -0.137
9  x2   x4  -0.127
10 x4   x2  -0.127
11 x2   x3   0.185
12 x3   x2   0.185
13 x1   x1   NA
14 x2   x2   NA
15 x3   x3   NA
16 x4   x4   NA

Example 2

Following snippet creates a matrix −

M1<-matrix(rpois(100,10),ncol=5)
M1

The following matrix is created −

      [,1][,2][,3][,4][,5]
[1,]  10   4   13   4   9
[2,]  14  12   13  15  11
[3,]  11   6   15  10  16
[4,]   6  10    5   9   9
[5,]   9   6   20  10  10
[6,]   7  10    6  10  12
[7,]  13  15    6  13   8
[8,]   3  14   11   9   8
[9,]  13  10    7   9  11
[10,] 10   7   16  13  12
[11,]  8   7    9   7  14
[12,]  7   6    9  16   6
[13,]  7  14    7   7  12
[14,]  8   7    7   5  10
[15,]  4   9   13  11   7
[16,] 15   9   14  11   8
[17,]  9  12    7   9  15
[18,] 11  10    3  14  10
[19,]  8   5   13  17  11
[20,]  7  13   10  12   7

To find the sequence of correlation between variables in M1, add the following code to the above snippet −

M1%>%
+ correlate()%>%
+ stretch() %>%
+ arrange(r)

Correlation method: 'pearson'
Missing treated using: 'pairwise.complete.obs'

# A tibble: 25 x 3

Output

If you execute all the above given snippets as a single program, it generates the following output −

    x    y     r
 <chr> <chr> <dbl>
1  V2  V3  -0.473
2  V3  V2  -0.473
3  V4  V5  -0.233
4  V5  V4  -0.233
5  V2  V5  -0.136
6  V5  V2  -0.136
7  V1  V2  -0.0355
8  V2  V1  -0.0355
9  V3  V5   0.0261
10 V5  V3   0.0261

Updated on: 23-Nov-2021

198 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements