Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to find the correlation matrix in R using all variables of a data frame?
Correlation matrix helps us to determine the direction and strength of linear relationship among multiple variables at a time. Therefore, it becomes easy to decide which variables should be used in the linear model and which ones could be dropped. We can find the correlation matrix by simply using cor function with data frame name.
Example
Consider the below data frame of continuous variable −
> set.seed(9)
> x1<-rnorm(20)
> x2<-rnorm(20,0.2)
> x3<-rnorm(20,0.5)
> x4<-rnorm(20,0.8)
> x5<-rnorm(20,1)
> df<-data.frame(x1,x2,x3,x4,x5)
> df
x1 x2 x3 x4 x5
1 -0.76679604 1.95699294 -0.30845634 1.081222227 1.11407587
2 -0.81645834 0.38225214 -1.51938169 -0.402708626 -0.05365988
3 -0.14153519 -0.06688875 -0.23872407 1.265163691 1.15599915
4 -0.27760503 1.12642163 0.88288656 1.152016386 2.30039421
5 0.43630690 -0.49333188 2.23086367 0.210143783 -0.15588645
6 -1.18687252 2.88199007 0.29691805 -0.053599959 1.21604185
7 1.19198691 0.42252448 -0.49639735 0.553267880 1.80447819
8 -0.01819034 -0.50667241 -0.80653629 2.339338571 0.26788427
9 -0.24808460 0.61721325 -0.49783160 1.346077684 -0.61809812
10 -0.36293689 0.56955678 -0.06502873 2.364961851 1.83906927
11 1.27757055 -0.71376435 2.25205784 1.049670178 0.64856205
12 -0.46889715 -0.11691475 -0.04777135 -1.162418630 0.28371561
13 0.07105410 1.24905921 -0.35852571 -0.009060223 0.05970815
14 -0.26603845 0.36811181 0.54929453 0.301314912 1.73016571
15 1.84525720 0.23144021 0.29995552 1.105121769 0.56212952
16 -0.83944966 -0.81033054 -0.60395445 0.510792758 0.75061790
17 -0.07744806 0.58275153 0.74058804 2.257714201 0.32792906
18 -2.61770553 -0.61969653 0.88111362 1.673755484 1.80101407
19 0.88788403 0.56171109 2.73045895 -0.152956042 -0.48886193
20 -0.70749145 0.29337136 1.69920239 0.768324524 1.45401160
Finding the correlation matrix for all variables in df −
> cor(df) x1 x2 x3 x4 x5 x1 1.00000000 -0.1332350 0.25115920 -0.04210749 -0.28891754 x2 -0.13323501 1.0000000 -0.15071432 -0.15398933 0.14759671 x3 0.25115920 -0.1507143 1.00000000 -0.05268172 -0.02505888 x4 -0.04210749 -0.1539893 -0.05268172 1.00000000 0.27861734 x5 -0.28891754 0.1475967 -0.02505888 0.27861734 1.00000000
Consider the below data frame of continuous variable −
> a1<-rpois(20,2) > a2<-rpois(20,5) > a3<-rpois(20,8) > a4<-rpois(20,10) > a5<-rpois(20,15) > df_new<-data.frame(a1,a2,a3,a4,a5) > df_new a1 a2 a3 a4 a5 1 2 8 9 5 13 2 1 4 7 11 16 3 2 2 5 12 11 4 1 3 12 9 15 5 1 4 8 4 14 6 0 6 9 8 14 7 2 6 12 10 9 8 7 5 13 11 20 9 0 6 6 13 19 10 4 7 10 8 12 11 0 3 14 8 20 12 3 2 10 15 13 13 2 8 7 12 14 14 2 6 10 11 14 15 2 1 5 10 21 16 2 3 12 10 14 17 3 6 7 9 17 18 0 7 6 14 16 19 2 6 6 9 15 20 2 3 7 8 12
Finding the correlation matrix for all variables in df_new −
> cor(df_new)
a1 a2 a3 a4 a5
a1 1.000000000 0.02485671 0.26409706 0.05617819 0.009229284
a2 0.024856710 1.00000000 -0.04540504 -0.10727065 -0.184062998
a3 0.264097059 -0.04540504 1.00000000 -0.17991092 -0.013487095
a4 0.056178192 -0.10727065 -0.17991092 1.00000000 0.115063107
a5 0.009229284 -0.18406300 -0.01348709 0.11506311 1.000000000Advertisements