- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to find the groupwise correlation matrix for an R data frame?
To create groupwise correlation matrix for an R data frame, we can follow the below steps −
- First of all, create a data frame.
- Then, find the correlation matrix by splitting the data frame based on categorical column.
Create the data frame
Let's create a data frame as shown below −
v1<-round(rnorm(25),2) v2<-round(rnorm(25),2) v3<-round(rnorm(25),2) v4<-round(rnorm(25),2) Factor<-sample(1:5,25,replace=TRUE) df<-data.frame(v1,v2,v3,v4,Factor) df
On executing, the above script generates the below output(this output will vary on your system due to randomization) −
v1 v2 v3 v4 Factor 1 0.69 1.14 -0.32 1.04 2 2 -0.79 -0.29 -1.56 -0.71 5 3 0.78 0.16 -0.17 -0.24 1 4 -0.68 0.25 -0.30 -1.22 4 5 -0.78 -1.47 0.49 -0.93 1 6 -1.00 0.96 -0.23 0.77 2 7 -0.32 -0.55 0.86 -0.45 2 8 0.30 0.73 -0.34 0.91 2 9 0.30 -0.15 -0.25 0.12 5 10 -0.75 -1.08 1.13 -0.39 3 11 -0.74 1.49 0.76 0.03 3 12 -0.62 2.16 0.75 0.59 2 13 0.97 0.71 0.00 0.10 1 14 -0.62 2.34 -1.60 0.16 3 15 0.06 1.26 2.67 -0.98 1 16 0.67 0.42 1.27 0.22 2 17 -0.71 -0.01 1.98 -1.02 1 18 1.11 -2.03 -1.07 0.81 3 19 0.76 1.50 -0.04 1.21 4 20 0.14 0.04 -0.22 -1.53 5 21 0.69 -1.02 -0.19 0.51 4 22 -1.24 -0.37 1.04 0.06 2 23 -0.24 -1.00 -0.28 1.17 1 24 -0.82 -1.29 0.64 -0.18 2 25 0.12 0.10 0.14 0.34 3
Create the groupwise correlation matrix
Using split function with lapply to create the correlation matrix for the data in df by Factor column −
v1<-round(rnorm(25),2) v2<-round(rnorm(25),2) v3<-round(rnorm(25),2) v4<-round(rnorm(25),2) Factor<-sample(1:5,25,replace=TRUE) df<-data.frame(v1,v2,v3,v4,Factor) lapply(split(df[,1:4],df$Factor),cor)
Output
$`1` v1 v2 v3 v4 v1 1.0000000 0.6046591 -0.3827462 0.3293069 v2 0.6046591 1.0000000 0.4941545 -0.2506760 v3 -0.3827462 0.4941545 1.0000000 -0.7279521 v4 0.3293069 -0.2506760 -0.7279521 1.0000000 $`2` v1 v2 v3 v4 v1 1.0000000 0.2775834 -0.2082356 0.3783813 v2 0.2775834 1.0000000 -0.3648746 0.7787637 v3 -0.2082356 -0.3648746 1.0000000 -0.7664309 v4 0.3783813 0.7787637 -0.7664309 1.0000000 $`3` v1 v2 v3 v4 v1 1.0000000 -0.6659695 -0.4512594 0.9059110 v2 -0.6659695 1.0000000 -0.1832839 -0.2942610 v3 -0.4512594 -0.1832839 1.0000000 -0.6666673 v4 0.9059110 -0.2942610 -0.6666673 1.0000000 $`4` v1 v2 v3 v4 v1 1.00000000 0.03852878 0.8424030 0.9712250 v2 0.03852878 1.00000000 0.5709046 0.2754070 v3 0.84240302 0.57090464 1.0000000 0.9464969 v4 0.97122501 0.27540704 0.9464969 1.0000000 $`5` v1 v2 v3 v4 v1 1.0000000 0.7335980 0.98786495 0.13938476 v2 0.7335980 1.0000000 0.83024549 -0.57069740 v3 0.9878649 0.8302455 1.00000000 -0.01610584 v4 0.1393848 -0.5706974 -0.01610584 1.00000000
Advertisements