How to find the mean of multiple columns based on a character column in R?


If we have a character column that means we are more likely to have duplicated values in that column hence finding the mean of numerical columns based on the values in character column cannot be done directly. For this purpose, we can use aggregate function as shown in the below examples.

Example1

Consider the below data frame −

 Live Demo

set.seed(214)
x1<−sample(c("A","B","C"),20,replace=TRUE)
x2<−rnorm(20,2,0.25)
x3<−rnorm(20,524,32.14)
x4<−rnorm(20,0.4,0.007)
df1<−data.frame(x1,x2,x3,x4)
df1

Output

  x1    x2      x3        x4
1 C 1.798039 483.1139 0.4105840
2 A 2.044579 574.3082 0.3970706
3 B 1.563161 517.0169 0.3983528
4 A 1.722058 546.6778 0.4049876
5 C 2.279675 527.9437 0.3959229
6 C 1.721763 549.1909 0.4096398
7 A 1.903552 486.4005 0.4091737
8 C 2.268382 502.0860 0.3998178
9 B 1.637324 540.1949 0.3970408
10 B 2.204911 433.7537 0.4003291
11 A 2.124006 521.2464 0.3935380
12 C 1.830028 525.4325 0.3937420
13 C 1.353421 553.7063 0.3974116
14 B 2.067984 547.9531 0.3958582
15 C 1.849775 537.0034 0.3986610
16 C 1.763957 531.4417 0.4015034
17 A 1.932413 545.7784 0.4215530
18 B 2.147503 491.7046 0.3918060
19 C 2.222639 563.4852 0.3902110
20 A 1.979280 483.2847 0.3820964

Finding the column means based on values in x1 −

Example

aggregate(.~x1,data=df1,mean)

Output

  x1    x2      x3       x4
1 A 1.950981 526.2827 0.4014032
2 B 1.924176 506.1246 0.3966774
3 C 1.898631 530.3782 0.3997215

Example2

 Live Demo

y1<−sample(c("Male","Female"),20,replace=TRUE)
y2<−rpois(20,2)
y3<−rpois(20,5)
y4<−rpois(20,3)
y5<−rpois(20,2)
y6<−rpois(20,1)
y7<−rpois(20,4)
df2<−data.frame(y1,y2,y3,y4,y5,y6,y7)
df2

Output

   y1     y2 y3 y4 y5 y6 y7
1  Male   1  6  1  1  1  4
2  Male   4  2  1  4  0 5
3  Male   2  6  5  3  1 2
4  Male   4  4  0  3  2  0
5  Male   1  2  6  2  2 2
6  Female 5  4  2  4  0 5
7  Male   2  4  3  1  2 3
8  Male   3  8  1  2  1 5
9  Male   0  3  2  1  2 5
10 Female 3  5  1  2  1 3
11 Female 2  5  3  2  1 2
12 Female 3  8  1  0  1 6
13 Female 1  2  5  2  1 3
14 Male   2  7  1  1  0 3
15 Male   3  5  2  4  2 4
16 Female 5  6  2  4  1 0
17 Female 2  3  1  0  1 3
18 Male   2  6  3  0  1 6
19 Female 2  6  5  1  0 0
20 Female 2  6  4  5  0 3

Finding the column means based on values in y1 −

aggregate(.~y1,data=df2,mean)

Output

    y1      y2      y3       y4       y5        y6 y7
1 Female 2.777778 5.000000 2.666667 2.222222 0.6666667 2.777778
2 Male 2.181818 4.818182 2.272727 2.000000 1.2727273 3.545455

Updated on: 05-Feb-2021

229 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements