How to deal with missing values to calculate correlation matrix in R?

R Programming Server Side Programming Programming

Often the data frames and matrices in R, we get have missing values and if we want to find the correlation matrix for those data frames and matrices, we stuck. It happens with almost everyone in Data Analysis but we can solve that problem by using na.omit while using the cor function to calculate the correlation matrix. Check out the examples below for that.

Example

Consider the below data frame −

Live Demo

> x1<-sample(c(1:5,NA),500,replace=TRUE)
> x2<-sample(c(rnorm(50,2,5),NA),500,replace=TRUE)
> x3<-sample(c(rpois(50,2),NA),500,replace=TRUE)
> x4<-sample(c(runif(50,2,10),NA),500,replace=TRUE)
> df<-data.frame(x1,x2,x3,x4)
> head(df,20)

Output

 x1     x2    x3    x4
1 2 2.6347839 4 2.577690
2 3 0.3082031 1 6.250998
3 1 0.3082031 3 7.786711
4 1 2.6347839 0 3.449600
5 NA 2.5107175 1 7.269619
6 4 2.4450443 4 6.250998
7 NA 1.1747742 2 3.053929
8 NA 2.4450443 3 5.860071
9 5 6.6736496 4 7.979433
10 NA 2.4450443 2 6.250998
11 NA 1.1747742 5 NA
12 2 11.1483587 1 9.498951
13 4 2.1400502 NA 9.299100
14 2 -0.8043954 3 2.883222
15 1 1.5054120 0 2.765324
16 1 0.1283554 2 7.918015
17 3 3.0337960 3 5.588130
18 1 4.5603861 2 7.979433
19 3 4.4976830 4 8.434829
20 1 9.4147186 2 3.053929

> tail(df,20)

Output

   x1    x2     x3    x4
481 2 -1.9780830 4 9.299100
482 3 2.0495769 1 9.639262
483 3 -4.5421502 2 3.374645
484 NA 2.1400502 3 NA
485 2 -4.0551622 2 5.999863
486 4 5.8547691 2 3.593138
487 NA NA 2 9.549274
488 3 3.9160824 1 3.053929
489 1 11.1483587 5 7.786711
490 3 -2.7581511 2 9.433952
491 NA 4.8002434 1 5.824331
492 2 4.8002434 2 8.434829
493 2 1.9706702 2 3.053929
494 NA 2.5099287 2 7.979433
495 4 1.9706702 1 7.929130
496 2 4.5919890 2 9.973436
497 4 2.5099287 4 7.269619
498 4 0.3082031 3 3.053929
499 1 5.4593713 2 9.973436
500 NA -1.9780830 4 3.219703

> cor(na.omit(df))

Output

         x1         x2          x3       x4
x1 1.000000000 0.009571313 -0.06363564 0.03276244
x2 0.009571313 1.000000000 0.08123065 0.03330818
x3 -0.063635640 0.081230649 1.00000000 0.03503841
x4 0.032762439 0.033308181 0.03503841 1.00000000

Let’s have a look at an example with matrix data −

Example

Live Demo

> M<-matrix(sample(c(rpois(10,2),NA),36,replace=TRUE),nrow=6)
> M

Output

   [,1] [,2] [,3] [,4] [,5] [,6]
[1,] 2    2    2    2    NA   3
[2,] 3    2    4    1    4    3
[3,] 3    NA   1    1    1    NA
[4,] 3    NA   3    2    2    1
[5,] 1    4    3    2    2    2
[6,] 1    2    1    3    1    1

> cor(na.omit(M))

Output

         [,1]       [,2]      [,3]       [,4]     [,5]    [,6]
[1,] 1.0000000 -0.5000000 0.7559289 -0.8660254 0.9449112 0.8660254
[2,] -0.5000000 1.0000000 0.1889822 0.0000000 -0.1889822 0.0000000
[3,] 0.7559289 0.1889822 1.0000000 -0.9819805 0.9285714 0.9819805
[4,] -0.8660254 0.0000000 -0.9819805 1.0000000 -0.9819805 -1.0000000
[5,] 0.9449112 -0.1889822 0.9285714 -0.9819805 1.0000000 0.9819805
[6,] 0.8660254 0.0000000 0.9819805 -1.0000000 0.9819805 1.0000000

Nizamuddin Siddiqui

Updated on: 08-Sep-2020

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started