- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to deal with missing values to calculate correlation matrix in R?
Often the data frames and matrices in R, we get have missing values and if we want to find the correlation matrix for those data frames and matrices, we stuck. It happens with almost everyone in Data Analysis but we can solve that problem by using na.omit while using the cor function to calculate the correlation matrix. Check out the examples below for that.
Example
Consider the below data frame −
> x1<-sample(c(1:5,NA),500,replace=TRUE) > x2<-sample(c(rnorm(50,2,5),NA),500,replace=TRUE) > x3<-sample(c(rpois(50,2),NA),500,replace=TRUE) > x4<-sample(c(runif(50,2,10),NA),500,replace=TRUE) > df<-data.frame(x1,x2,x3,x4) > head(df,20)
Output
x1 x2 x3 x4 1 2 2.6347839 4 2.577690 2 3 0.3082031 1 6.250998 3 1 0.3082031 3 7.786711 4 1 2.6347839 0 3.449600 5 NA 2.5107175 1 7.269619 6 4 2.4450443 4 6.250998 7 NA 1.1747742 2 3.053929 8 NA 2.4450443 3 5.860071 9 5 6.6736496 4 7.979433 10 NA 2.4450443 2 6.250998 11 NA 1.1747742 5 NA 12 2 11.1483587 1 9.498951 13 4 2.1400502 NA 9.299100 14 2 -0.8043954 3 2.883222 15 1 1.5054120 0 2.765324 16 1 0.1283554 2 7.918015 17 3 3.0337960 3 5.588130 18 1 4.5603861 2 7.979433 19 3 4.4976830 4 8.434829 20 1 9.4147186 2 3.053929
> tail(df,20)
Output
x1 x2 x3 x4 481 2 -1.9780830 4 9.299100 482 3 2.0495769 1 9.639262 483 3 -4.5421502 2 3.374645 484 NA 2.1400502 3 NA 485 2 -4.0551622 2 5.999863 486 4 5.8547691 2 3.593138 487 NA NA 2 9.549274 488 3 3.9160824 1 3.053929 489 1 11.1483587 5 7.786711 490 3 -2.7581511 2 9.433952 491 NA 4.8002434 1 5.824331 492 2 4.8002434 2 8.434829 493 2 1.9706702 2 3.053929 494 NA 2.5099287 2 7.979433 495 4 1.9706702 1 7.929130 496 2 4.5919890 2 9.973436 497 4 2.5099287 4 7.269619 498 4 0.3082031 3 3.053929 499 1 5.4593713 2 9.973436 500 NA -1.9780830 4 3.219703
> cor(na.omit(df))
Output
x1 x2 x3 x4 x1 1.000000000 0.009571313 -0.06363564 0.03276244 x2 0.009571313 1.000000000 0.08123065 0.03330818 x3 -0.063635640 0.081230649 1.00000000 0.03503841 x4 0.032762439 0.033308181 0.03503841 1.00000000
Let’s have a look at an example with matrix data −
Example
> M<-matrix(sample(c(rpois(10,2),NA),36,replace=TRUE),nrow=6) > M
Output
[,1] [,2] [,3] [,4] [,5] [,6] [1,] 2 2 2 2 NA 3 [2,] 3 2 4 1 4 3 [3,] 3 NA 1 1 1 NA [4,] 3 NA 3 2 2 1 [5,] 1 4 3 2 2 2 [6,] 1 2 1 3 1 1
> cor(na.omit(M))
Output
[,1] [,2] [,3] [,4] [,5] [,6] [1,] 1.0000000 -0.5000000 0.7559289 -0.8660254 0.9449112 0.8660254 [2,] -0.5000000 1.0000000 0.1889822 0.0000000 -0.1889822 0.0000000 [3,] 0.7559289 0.1889822 1.0000000 -0.9819805 0.9285714 0.9819805 [4,] -0.8660254 0.0000000 -0.9819805 1.0000000 -0.9819805 -1.0000000 [5,] 0.9449112 -0.1889822 0.9285714 -0.9819805 1.0000000 0.9819805 [6,] 0.8660254 0.0000000 0.9819805 -1.0000000 0.9819805 1.0000000
- Related Articles
- How to find the correlation matrix for a data frame that contains missing values in R?
- How to round correlation values in the correlation matrix to zero decimal places in R?
- How to deal with warning “removed n rows containing missing values” while using ggplot2 in R?
- How to find the correlation matrix with p-values for an R data frame?
- How to create correlation matrix plot in R?
- How to convert a column with missing values to binary with 0 for missing values in R?
- How to convert a correlation matrix into a logical matrix based on correlation coefficient in R?
- How to convert diagonal elements of a matrix in R into missing values?
- How to fill a data.table row with missing values in R?
- How to create correlation matrix plot without variables labels in R?
- How to deal with missing column for row names when converting data frame to data.table object in R?
- How to change the size of correlation coefficient value in correlation matrix plot using corrplot in R?
- How to check matrix values equality with a vector values in R?
- How to create a matrix with random values in R?
- How to create matrix with random integer values in R?

Advertisements