- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to find the significant correlation in an R data frame?
To find the significant correlation in an R data frame, we would need to find the matrix of p-values for the correlation test. This can be done by using the function rcorr of Hmisc package and read the output as matrix. For example, if we have a data frame called df then the correlation matrix with p-values can be found by using rcorr(as.matrix(df)).
Example1
Consider the below data frame −
> x1<-rnorm(20) > x2<-rnorm(20) > x3<-rnorm(20) > df1<-data.frame(x1,x2,x3) > df1
Output
x1 x2 x3 1 -0.96730523 -1.73067540 -0.01974065 2 0.08564529 -0.05200856 0.76356487 3 -0.33694783 -0.30326744 -0.04760562 4 0.54367676 2.35227967 1.43707451 5 1.12280219 -0.18757952 -1.32278427 6 -0.33947234 0.23128580 -0.05856621 7 0.44756887 -1.38533649 -1.00647630 8 -2.51192456 1.05865975 0.28503664 9 -0.29031722 1.02173256 0.15224756 10 0.36920006 0.17323515 -0.35192833 11 -0.17268384 -1.14498165 0.03180043 12 -0.20811125 -0.49241097 -0.60731423 13 -0.03852074 0.41839372 0.93668284 14 1.98958724 0.85683240 -1.80125628 15 -1.46587108 -0.72375704 0.69243074 16 1.36737574 0.09767378 0.31809893 17 -1.23625739 -1.63587272 0.67043038 18 0.12273089 -0.77565928 -1.48336472 19 0.82783551 0.82508774 0.20627496 20 -0.08917803 0.60930926 -1.92432261
Loading Hmisc package and finding the p-values matrix for correlation test for the columns in df1 −
> library(Hmisc) > rcorr(as.matrix(df1))
Output
x1 x2 x3 x1 1.00 0.25 -0.38 x2 0.25 1.00 0.16 x3 -0.38 0.16 1.00
Example
n = 20 P
Output
x1 x2 x3 x1 0.2899 0.1030 x2 0.2899 0.4919 x3 0.1030 0.4919
Example2
> y1<-rpois(20,2) > y2<-rpois(20,5) > y3<-rpois(20,1) > y4<-rpois(20,1) > y5<-rpois(20,5) > df2<-data.frame(y1,y2,y3,y4,y5) > df2
Output
y1 y2 y3 y4 y5 1 2 5 1 1 2 2 2 1 1 0 7 3 1 2 1 0 4 4 1 5 1 0 5 5 4 6 0 2 6 6 2 4 2 0 2 7 2 0 1 0 3 8 4 8 1 1 5 9 0 3 1 1 5 10 0 2 0 3 5 11 1 5 2 1 3 12 0 2 1 0 6 13 3 5 3 0 7 14 3 6 0 0 3 15 0 6 0 1 9 16 3 4 2 1 0 17 1 5 0 2 6 18 0 7 2 2 6 19 2 5 0 1 4 20 1 3 3 0 8
Finding the p-values matrix for correlation test for the columns in df2 −
> rcorr(as.matrix(df2))
Output
y1 y2 y3 y4 y5 y1 1.00 0.32 0.03 -0.16 -0.32 y2 0.32 1.00 -0.06 0.31 0.07 y3 0.03 -0.06 1.00 -0.40 -0.04 y4 -0.16 0.31 -0.40 1.00 0.06 y5 -0.32 0.07 -0.04 0.06 1.00
Example
n= 20
Output
y1 y2 y3 y4 y5 y1 0.1667 0.8898 0.4971 0.1714 y2 0.1667 0.7915 0.1873 0.7800 y3 0.8898 0.7915 0.0795 0.8694 y4 0.4971 0.1873 0.0795 0.8066 y5 0.1714 0.7800 0.8694 0.8066
Example3
> z1<-runif(20,2,5) > z2<-runif(20,2,10) > z3<-runif(20,5,10) > df3<-data.frame(z1,z2,z3) > df3
Output
z1 z2 z3 1 2.551367 4.399332 7.336909 2 3.513887 4.358521 5.377418 3 3.912958 9.211070 6.693739 4 4.878766 4.827914 9.044594 5 2.290927 5.935495 8.265392 6 3.225698 8.094953 8.095421 7 4.508908 3.864593 8.245445 8 3.418809 9.196999 8.158323 9 3.394496 2.589988 7.007051 10 3.395509 4.175238 5.704264 11 2.730546 6.833714 6.910100 12 4.147959 2.176295 6.996571 13 2.198546 6.049636 7.975485 14 2.275193 4.090590 7.933500 15 3.095163 6.409786 9.948502 16 2.388818 4.006544 9.998355 17 2.138960 5.293971 8.822274 18 2.439146 4.649725 7.313394 19 4.026674 8.068449 8.128699 20 4.436093 2.695067 6.952906
Finding the p-values matrix for correlation test for the columns in df3 −
> rcorr(as.matrix(df3))
Output
z1 z2 z3 z1 1.00 -0.08 -0.18 z2 -0.08 1.00 0.17 z3 -0.18 0.17 1.00
Example
n = 20 P
Output
z1 z2 z3 z1 0.7265 0.4435 z2 0.7265 0.4641 z3 0.4435 0.4641
Advertisements