- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to find the correlation matrix for a data frame that contains missing values in R?
To find the correlation matrix for a data frame, we can use cor function with the data frame object name but if there exist missing values in the data frame then it is not that straight forward. In such type of situations, we can use complete.obs with the cor function so that the missing values will be ignored while calculating the correlation coefficients.
Example1
Consider the below data frame:
> x1<-sample(c(NA,24,5,7,8),20,replace=TRUE) > x2<-sample(c(NA,2,3,1,4,7),20,replace=TRUE) > x3<-sample(c(NA,512,520,530),20,replace=TRUE) > df1<-data.frame(x1,x2,x3) > df1
Output
x1 x2 x3 1 NA 3 512 2 8 7 512 3 5 2 520 4 NA 1 NA 5 NA 2 512 6 NA 4 NA 7 5 NA 530 8 NA NA 530 9 24 3 NA 10 NA 1 512 11 5 2 530 12 NA 7 520 13 5 1 NA 14 8 3 530 15 7 1 NA 16 7 4 530 17 7 3 512 18 5 2 530 19 7 3 530 20 NA 1 512
Finding the correlation matrix for df1:
Example
> cor(df1,use="complete.obs",method="pearson")
Output
x1 x2 x3 x1 1.0000000 0.7190925 -0.2756960 x2 0.7190925 1.0000000 -0.5200868 x3 -0.2756960 -0.5200868 1.0000000
Example2
> y1<-sample(c(NA,rnorm(5,5,1)),20,replace=TRUE) > y2<-sample(c(NA,rnorm(5,2,1)),20,replace=TRUE) > y3<-sample(c(NA,rnorm(10,10,1)),20,replace=TRUE) > y4<-sample(c(NA,rnorm(10,5,2.5)),20,replace=TRUE) > df2<-data.frame(y1,y2,y3,y4) > df2
Output
y1 y2 y3 y4 1 NA 2.955947 NA 2.8623715 2 NA 3.087940 9.099791 4.5996351 3 NA 3.087940 9.589898 5.6097088 4 3.500343 1.150117 10.985979 NA 5 4.831364 3.087940 10.107124 NA 6 7.041597 1.840461 9.416738 2.8601661 7 NA 2.212388 10.453622 5.0717510 8 4.831364 3.087940 10.928925 6.3030777 9 7.041597 NA 9.099791 5.2709332 10 4.831364 2.212388 NA 2.6219274 11 4.831364 2.212388 10.928925 6.3030777 12 3.500343 NA 8.779948 6.3030777 13 4.772150 1.840461 9.589898 5.2709332 14 7.041597 2.955947 10.453622 5.5989568 15 NA 2.955947 9.827149 5.5989568 16 7.041597 1.840461 9.099791 5.5989568 17 3.500343 2.212388 8.779948 4.5996351 18 4.772150 2.212388 10.985979 NA 19 NA 2.955947 10.453622 0.3151969 20 4.772150 1.150117 9.099791 6.3030777
Finding the correlation matrix for df2:
Example
> cor(df2,use="complete.obs",method="pearson")
Output
y1 y2 y3 y4 y1 1.00000000 0.07343574 0.06408734 -0.3103069 y2 0.07343574 1.00000000 0.70344970 0.1674528 y3 0.06408734 0.70344970 1.00000000 0.4544444 y4 -0.31030689 0.16745277 0.45444435 1.0000000
- Related Articles
- How to visualize a data frame that contains missing values in R?
- How to find the correlation matrix with p-values for an R data frame?
- How to find the groupwise correlation matrix for an R data frame?
- How to find the correlation matrix for rows of an R data frame?
- How to deal with missing values to calculate correlation matrix in R?
- How to find the correlation matrix in R using all variables of a data frame?
- How to find rows in an R data frame that do not have missing values?
- How to remove rows that contains coded missing value for all columns in an R data frame?
- How to find the percentage of missing values in an R data frame?
- How to find the correlation matrix by considering only numerical columns in an R data frame?
- How to find the sequence of correlation between variables in an R data frame or matrix?
- How to find the row mean for columns in an R data frame by ignoring missing values?
- How to find the significant correlation in an R data frame?
- How to find the number of groupwise missing values in an R data frame?
- How to convert a data frame column to date that contains integer values in R?

Advertisements