How to find the correlation for data frame having numeric and non-numeric columns in R?


To find the correlation for data frame having numeric and non-numeric columns, we can use cor function with sapply and use complete.obs for pearson method. For example, if we have a data frame called then we can use the below command to find the correlation coefficient −

cor(df[,sapply(df,is.numeric)],use="complete.obs",method="pearson")

Example1

Consider the below data frame −

Live Demo

> x1<-sample(LETTERS[1:4],20,replace=TRUE)
> x2<-rpois(20,5)
> x3<-rpois(20,1)
> df1<-data.frame(x1,x2,x3)
> df1

Output

   x1 x2 x3 1   C 11  2 2   A  3  1 3   C  4  0 4   D 10  2 5   A  1  0 6   A  4  1 7   D  4  0 8   B  2  0 9   C  6  1 10  C  4  2 11  A  7  1 12  C  5  0 13  B  5  0 14  D  5  2 15  C  8  1 16  A  7  0 17  B  2  0 18  B  5  0 19  B  4  2 20  A  8  1

Finding correlation among numerical columns of df1 −

> cor(df1[,sapply(df1,is.numeric)],use="complete.obs",method="pearson")

Output

          x2        x3
x2 1.0000000 0.4832695
x3 0.4832695 1.0000000

Example2

Live Demo

> y1<-rnorm(20)
> y2<-rnorm(20)
> y3<-sample(c("Hot","Cold"),20,replace=TRUE)
> y4<-sample(c("Male","Female"),20,replace=TRUE)
> y5<-rpois(20,2)
> df2<-data.frame(y1,y2,y3,y4,y5)
> df2

Output

            y1          y2   y3     y4 y5
1   1.51725168 -0.52762451 Cold   Male  3
2   0.84772773 -0.43382197  Hot Female  2
3  -1.73640048  0.74754602 Cold Female  2
4   0.72972822 -0.07814968  Hot   Male  1
5   1.69906347  0.56659629  Hot   Male  1
6  -0.01761764  0.13790528  Hot   Male  5
7  -2.06662444  0.84961541 Cold   Male  2
8  -1.09416818  0.90565331  Hot Female  3
9  -1.33657153  0.80483709  Hot   Male  1
10  1.97558526  1.24105635 Cold Female  0
11 -0.21074711  0.13355731  Hot Female  2
12  1.02177951 -0.59891452 Cold Female  4
13  1.73358364  0.11105171 Cold   Male  1
14  0.37426668  0.68837549  Hot   Male  1
15  1.74025264 -0.15972807  Hot Female  0
16  0.30275475  0.20629397 Cold Female  1
17 -0.28661576  1.01552432  Hot   Male  3
18 -0.42663944 -1.30746381  Hot   Male  3
19 -0.23888520  1.36409027 Cold Female  1
20  0.32587990  0.38175578 Cold   Male  0
 

Finding correlation among numerical columns of df2 −

> cor(df2[,sapply(df2,is.numeric)],use="complete.obs",method="pearson")

Output

           y1         y2         y5
y1  1.0000000 -0.3038048 -0.2803100
y2 -0.3038048  1.0000000 -0.3424033
y5 -0.2803100 -0.3424033  1.0000000

Updated on: 06-Mar-2021

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements