# How to find the correlation matrix in R using all variables of a data frame?

Correlation matrix helps us to determine the direction and strength of linear relationship among multiple variables at a time. Therefore, it becomes easy to decide which variables should be used in the linear model and which ones could be dropped. We can find the correlation matrix by simply using cor function with data frame name.

## Example

Consider the below data frame of continuous variable −

> set.seed(9)
> x1<-rnorm(20)
> x2<-rnorm(20,0.2)
> x3<-rnorm(20,0.5)
> x4<-rnorm(20,0.8)
> x5<-rnorm(20,1)
> df<-data.frame(x1,x2,x3,x4,x5)
> df
x1          x2          x3           x4          x5
1  -0.76679604  1.95699294 -0.30845634  1.081222227  1.11407587
2  -0.81645834  0.38225214 -1.51938169 -0.402708626 -0.05365988
3  -0.14153519 -0.06688875 -0.23872407  1.265163691  1.15599915
4  -0.27760503  1.12642163  0.88288656  1.152016386  2.30039421
5   0.43630690 -0.49333188  2.23086367  0.210143783 -0.15588645
6  -1.18687252  2.88199007  0.29691805 -0.053599959  1.21604185
7   1.19198691  0.42252448 -0.49639735  0.553267880  1.80447819
8  -0.01819034 -0.50667241 -0.80653629  2.339338571  0.26788427
9  -0.24808460  0.61721325 -0.49783160  1.346077684 -0.61809812
10 -0.36293689  0.56955678 -0.06502873  2.364961851  1.83906927
11  1.27757055 -0.71376435  2.25205784  1.049670178  0.64856205
12 -0.46889715 -0.11691475 -0.04777135 -1.162418630  0.28371561
13  0.07105410  1.24905921 -0.35852571 -0.009060223  0.05970815
14 -0.26603845  0.36811181  0.54929453  0.301314912  1.73016571
15  1.84525720  0.23144021  0.29995552  1.105121769  0.56212952
16 -0.83944966 -0.81033054 -0.60395445  0.510792758  0.75061790
17 -0.07744806  0.58275153  0.74058804  2.257714201  0.32792906
18 -2.61770553 -0.61969653  0.88111362  1.673755484  1.80101407
19  0.88788403  0.56171109  2.73045895 -0.152956042 -0.48886193
20 -0.70749145  0.29337136  1.69920239  0.768324524  1.45401160

Finding the correlation matrix for all variables in df −

> cor(df)
x1 x2 x3 x4 x5
x1  1.00000000 -0.1332350  0.25115920 -0.04210749 -0.28891754
x2 -0.13323501  1.0000000 -0.15071432 -0.15398933  0.14759671
x3  0.25115920 -0.1507143  1.00000000 -0.05268172 -0.02505888
x4 -0.04210749 -0.1539893 -0.05268172  1.00000000  0.27861734
x5 -0.28891754  0.1475967 -0.02505888  0.27861734  1.00000000

Consider the below data frame of continuous variable −

> a1<-rpois(20,2)
> a2<-rpois(20,5)
> a3<-rpois(20,8)
> a4<-rpois(20,10)
> a5<-rpois(20,15)
> df_new<-data.frame(a1,a2,a3,a4,a5)
> df_new
a1 a2 a3 a4 a5
1   2 8  9  5 13
2   1 4  7 11 16
3   2 2  5 12 11
4   1 3 12  9 15
5   1 4  8  4 14
6   0 6  9  8 14
7   2 6 12 10  9
8   7 5 13 11 20
9   0 6  6 13 19
10  4 7 10  8 12
11  0 3 14  8 20
12  3 2 10 15 13
13  2 8  7 12 14
14  2 6 10 11 14
15  2 1  5 10 21
16  2 3 12 10 14
17  3 6  7  9 17
18  0 7  6 14 16
19  2 6  6  9 15
20  2 3  7  8 12

Finding the correlation matrix for all variables in df_new −

> cor(df_new)
a1          a2          a3          a4           a5
a1 1.000000000  0.02485671  0.26409706  0.05617819  0.009229284
a2 0.024856710  1.00000000 -0.04540504 -0.10727065 -0.184062998
a3 0.264097059 -0.04540504  1.00000000 -0.17991092 -0.013487095
a4 0.056178192 -0.10727065 -0.17991092  1.00000000  0.115063107
a5 0.009229284 -0.18406300 -0.01348709  0.11506311  1.000000000

Advertisements