How to find the group-wise correlation coefficient in R?


If we have two continuous and one categorical column in an R data frame then we can find the correlation coefficient between continuous values for the categories in the categorical column. For this purpose, we can use by function and pass the cor function with the spearman method as shown in the below examples.

Example1

Consider the below data frame:

Live Demo

> x1<-sample(c("A","B","C"),20,replace=TRUE)
> y1<-rnorm(20,1,0.24)
> z1<-rpois(20,2)
> df1<-data.frame(x1,y1,z1)
> df1

Output

  x1   y1    z1
1 A 1.1155324 2
2 C 0.9801564 3
3 B 0.9116162 1
4 A 0.8406772 3
5 C 0.8009355 2
6 A 0.9331637 2
7 B 1.0642089 1
8 B 1.1633515 0
9 B 1.1599037 5
10 B 1.0509981 2
11 B 0.7574267 1
12 B 0.8456225 1
13 B 0.8926751 2
14 B 0.6074419 3
15 C 0.7999792 0
16 A 1.0685236 2
17 B 0.9756677 3
18 A 0.9495342 0
19 C 1.0109747 2
20 A 0.9090985 4

Finding the correlation between y1 and z1 for categories in x1:

Example

> by(df1,df1$x1,FUN=function(x) cor(df1$y1,df1$z1,method="spearman"))
df1$x1: A

Output

[1] 0.03567607

Example

df1$x1: B

Output

[1] 0.03567607

Example

df1$x1: C

Output

[1] 0.03567607

Example2

Live Demo

> x2<-sample(c("India","China","France"),20,replace=TRUE)
> y2<-rexp(20,0.335)
> z2<-runif(20,2,10)
> df2<-data.frame(x2,y2,z2)
> df2

Output

    x2      y2         z2
1 France 2.31790394 2.649538
2 China 10.61012173 8.340615
3 France 5.00085220 6.602884
4 France 1.67707140 7.722530
5 India 9.60663732 9.837268
6 France 1.46030289 5.370930
7 France 10.44614704 9.035748
8 India 0.39506766 6.318701
9 China 1.83071453 7.282782
10 China 0.23080001 7.210144
11 India 2.27763766 9.233019
12 China 18.21276888 9.928614
13 France 1.72085517 9.176826
14 India 4.77786071 8.899026
15 China 8.55501571 7.240147
16 China 0.19832026 5.641800
17 India 0.03113389 6.928705
18 China 0.56958471 3.496314
19 China 0.72728737 6.903436
20 India 8.73571474 5.286486

Finding the correlation between y2 and z2 for categories in x2:

Example

> by(df2,df2$x2,FUN=function(x) cor(df2$y2,df2$z2,method="spearman"))
df2$x2: China

Output

[1] 0.487218

Example

df2$x2: France

Output

[1] 0.487218

Example

df2$x2: India

Output

[1] 0.487218

Updated on: 23-Nov-2020

750 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements