How to find the significant correlation in an R data frame?

R ProgrammingServer Side ProgrammingProgramming

To find the significant correlation in an R data frame, we would need to find the matrix of p-values for the correlation test. This can be done by using the function rcorr of Hmisc package and read the output as matrix. For example, if we have a data frame called df then the correlation matrix with p-values can be found by using rcorr(as.matrix(df)).

Example1

Consider the below data frame −

Live Demo

> x1<-rnorm(20)
> x2<-rnorm(20)
> x3<-rnorm(20)
> df1<-data.frame(x1,x2,x3)
> df1

Output

            x1          x2          x3
1  -0.96730523 -1.73067540 -0.01974065
2   0.08564529 -0.05200856  0.76356487
3  -0.33694783 -0.30326744 -0.04760562
4   0.54367676  2.35227967  1.43707451
5   1.12280219 -0.18757952 -1.32278427
6  -0.33947234  0.23128580 -0.05856621
7   0.44756887 -1.38533649 -1.00647630
8  -2.51192456  1.05865975  0.28503664
9  -0.29031722  1.02173256  0.15224756
10  0.36920006  0.17323515 -0.35192833
11 -0.17268384 -1.14498165  0.03180043
12 -0.20811125 -0.49241097 -0.60731423
13 -0.03852074  0.41839372  0.93668284
14  1.98958724  0.85683240 -1.80125628
15 -1.46587108 -0.72375704  0.69243074
16  1.36737574  0.09767378  0.31809893
17 -1.23625739 -1.63587272  0.67043038
18  0.12273089 -0.77565928 -1.48336472
19  0.82783551  0.82508774  0.20627496
20 -0.08917803  0.60930926 -1.92432261

Loading Hmisc package and finding the p-values matrix for correlation test for the columns in df1 −

> library(Hmisc)
> rcorr(as.matrix(df1))

Output

      x1   x2    x3
x1  1.00 0.25 -0.38
x2  0.25 1.00  0.16
x3 -0.38 0.16  1.00

Example

n = 20
P

Output

      x1     x2     x3  
x1        0.2899 0.1030
x2 0.2899        0.4919
x3 0.1030 0.4919      

Example2

Live Demo

> y1<-rpois(20,2)
> y2<-rpois(20,5)
> y3<-rpois(20,1)
> y4<-rpois(20,1)
> y5<-rpois(20,5)
> df2<-data.frame(y1,y2,y3,y4,y5)
> df2

Output

   y1 y2 y3 y4 y5
1   2  5  1  1  2
2   2  1  1  0  7
3   1  2  1  0  4
4   1  5  1  0  5
5   4  6  0  2  6
6   2  4  2  0  2
7   2  0  1  0  3
8   4  8  1  1  5
9   0  3  1  1  5
10  0  2  0  3  5
11  1  5  2  1  3
12  0  2  1  0  6
13  3  5  3  0  7
14  3  6  0  0  3
15  0  6  0  1  9
16  3  4  2  1  0
17  1  5  0  2  6
18  0  7  2  2  6
19  2  5  0  1  4
20  1  3  3  0  8

Finding the p-values matrix for correlation test for the columns in df2 −

> rcorr(as.matrix(df2))

Output

      y1    y2    y3    y4    y5
y1  1.00  0.32  0.03 -0.16 -0.32
y2  0.32  1.00 -0.06  0.31  0.07
y3  0.03 -0.06  1.00 -0.40 -0.04
y4 -0.16  0.31 -0.40  1.00  0.06
y5 -0.32  0.07 -0.04  0.06  1.00

Example

n= 20

Output

      y1     y2     y3     y4     y5  
y1        0.1667 0.8898 0.4971 0.1714
y2 0.1667        0.7915 0.1873 0.7800
y3 0.8898 0.7915        0.0795 0.8694
y4 0.4971 0.1873 0.0795        0.8066
y5 0.1714 0.7800 0.8694 0.8066      

Example3

Live Demo

> z1<-runif(20,2,5)
> z2<-runif(20,2,10)
> z3<-runif(20,5,10)
> df3<-data.frame(z1,z2,z3)
> df3

Output

         z1       z2       z3
1  2.551367 4.399332 7.336909
2  3.513887 4.358521 5.377418
3  3.912958 9.211070 6.693739
4  4.878766 4.827914 9.044594
5  2.290927 5.935495 8.265392
6  3.225698 8.094953 8.095421
7  4.508908 3.864593 8.245445
8  3.418809 9.196999 8.158323
9  3.394496 2.589988 7.007051
10 3.395509 4.175238 5.704264
11 2.730546 6.833714 6.910100
12 4.147959 2.176295 6.996571
13 2.198546 6.049636 7.975485
14 2.275193 4.090590 7.933500
15 3.095163 6.409786 9.948502
16 2.388818 4.006544 9.998355
17 2.138960 5.293971 8.822274
18 2.439146 4.649725 7.313394
19 4.026674 8.068449 8.128699
20 4.436093 2.695067 6.952906

Finding the p-values matrix for correlation test for the columns in df3 −

> rcorr(as.matrix(df3))

Output

      z1    z2    z3
z1  1.00 -0.08 -0.18
z2 -0.08  1.00  0.17
z3 -0.18  0.17  1.00

Example

n = 20
P

Output

      z1     z2     z3  
z1        0.7265 0.4435
z2 0.7265        0.4641
z3 0.4435 0.4641      
raja
Published on 05-Mar-2021 10:50:59
Advertisements