How to find the mean of a numerical column by two categorical columns in an R data frame?


If we have two categorical columns along with a numerical column in an R data frame then we can find the mean of the numerical column by using the combination of the categorical columns with the help of aggregate function. For example, if a data frame df contains a numerical column X and two categorical columns C1 and C2 then the mean of X can be found for the combinations of C1 and C2 by using the below command −

aggregate(X~C1+C2,data=df,FUN="mean")

Example

Consider the below data frame −

C1<-sample(LETTERS[1:4],20,replace=TRUE)
C2<-factor(sample(1:2,20,replace=TRUE))
X<-rnorm(20,30,2.87)
df1<-data.frame(C1,C2,X)
df1

Output

  C1 C2 X
1  A 2 30.56001
2  D 2 32.18580
3  A 1 36.63182
4  B 1 32.35519
5  A 1 30.40990
6  B 2 31.57616
7  B 1 28.53280
8  D 1 32.35574
9  B 1 30.53733
10 A 1 27.79314
11 C 2 29.54564
12 A 2 27.64586
13 D 1 27.27475
14 D 2 33.99874
15 D 1 30.41017
16 C 1 27.66988
17 A 1 30.69182
18 A 2 34.12661
19 C 2 34.07609
20 C 1 32.29219

Finding the mean of X for the combinations of C1 and C2 −

Example

 Live Demo

aggregate(X~C1+C2,data=df1,FUN="mean")

Output

 C1 C2 X
1 A 1 31.38167
2 B 1 30.47510
3 C 1 29.98104
4 D 1 30.01355
5 A 2 30.77749
6 B 2 31.57616
7 C 2 31.81087
8 D 2 33.09227

Example

C1<-sample(c("Hot","Cold"),20,replace=TRUE)
C2<-sample(0:1,20,replace=TRUE)
Y<-rpois(20,5)
df2<-data.frame(C1,C2,Y)
df2

Output

   C1  C2 Y
1  Cold 1 7
2  Hot  1 5
3  Cold 0 5
4  Hot  1 3
5  Hot  0 5
6  Cold 1 6
7  Cold 1 10
8  Cold 0 2
9  Hot  1 7
10 Hot  1 4
11 Cold 1 7
12 Hot  0 4
13 Cold 0 4
14 Hot  1 3
15 Hot  1 4
16 Cold 0 5
17 Cold 0 8
18 Cold 0 5
19 Cold 0 3
20 Hot  1 7

Finding the mean of Y for the combinations of C1 and C2 −

Example

aggregate(Y~C1+C2,data=df2,FUN="mean")

Output

   C1 C2    Y
1 Cold 0 4.571429
2 Hot  0 4.500000
3 Cold 1 7.500000
4 Hot  1 4.714286

Updated on: 08-Dec-2020

841 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements