How to find the column means of a column based on another column values that represent factor in an R data frame?


If we have a column that represent factor then we might want to find the mean of values in other column(s) for the factor levels. This is helpful in comparing the levels of the factor. In R, we can find the mean for such type of data by using aggregate function. Check out the below examples to understand how it can be done.

Example1

Consider the below data frame:

Live Demo

> x1<-sample(c(LETTERS[1:4]),20,replace=TRUE)
> y1<-rnorm(20,5,1)
> df1<-data.frame(x1,y1)
> df1

Output

  x1   y1
1 D 5.801197
2 B 3.432060
3 B 6.154168
4 A 5.466655
5 D 5.171689
6 C 5.175170
7 B 5.353469
8 D 4.840470
9 C 4.158980
10 B 4.711343
11 D 4.348326
12 A 5.933382
13 A 3.484782
14 A 2.004760
15 C 4.963307
16 D 4.728794
17 B 3.606417
18 B 6.234446
19 C 4.625489
20 B 6.569928

Finding the mean of y1 based on values in x1:

Example

> aggregate(.~x1,data=df1,mean)

Output

  x1   y1
1 A 4.222395
2 B 5.151690
3 C 4.730736
4 D 4.978095

Example2

Live Demo

> x2<-sample(0:1,20,replace=TRUE)
> y2<-rpois(20,5)
> df2<-data.frame(x2,y2)
> df2

Output

 x2 y2
1 1 6
2 0 5
3 1 3
4 0 3
5 1 4
6 0 7
7 0 5
8 0 3
9 0 5
10 0 4
11 0 4
12 0 7
13 0 4
14 0 6
15 0 2
16 1 7
17 0 9
18 1 2
19 0 6
20 0 5

Finding the mean of y2 based on values in x2:

Example

> aggregate(.~x2,data=df2,mean)

Output

 x2 y2
1 0 5.0
2 1 4.4

Updated on: 23-Nov-2020

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements