How to find the groupwise cumulative mean in R?



To find the groupwise cumulative mean, we can use cummean function of dplyr package.

For example, if we have a data frame called df that contains a categorical column say Group and a numerical column say Response then the groupwise cumulative mean can be found by using the command given below −

df%>%group_by(Group)%>%mutate(CM=cummean(Response))

Example 1

Following snippet creates a sample data frame −

Group<-sample(LETTERS[1:4],20,replace=TRUE)
Score<-rpois(20,50)
df1<-data.frame(Group,Score)
df1

Output

The following dataframe is created −

Group Score
1  B  48
2  D  54
3  B  44
4  C  50
5  B  50
6  C  48
7  A  58
8  B  44
9  B  44
10 C  51
11 C  50
12 B  48
13 B  63
14 D  43
15 C  53
16 C  52
17 C  59
18 B  46
19 A  51
20 C  58

In order to load the dplyr package and find the groupwise cumulative mean for Score in df1, add the following code to the above snippet −

library(dplyr)
df1 %>% group_by(Group) %>% mutate(Cum_Mean = cummean(Score))
# A tibble: 20 x 3
# Groups: Group [4]

Output

If you execute all the above given snippets as a single program, it generates the following output −

 Group Score Cum_Mean
 <chr> <int> <dbl>
1  B   48    48
2  D   54    54
3  B   44    46
4  C   50    50
5  B   50    47.3
6  C   48    49
7  A   58    58
8  B   44    46.5
9  B   44    46
10 C   51    49.7
11 C   50    49.8
12 B   48    46.3
13 B   63    48.7
14 D   43    48.5
15 C   53    50.4
16 C   52    50.7
17 C   59    51.9
18 B   46    48.4
19 A   51    54.5
20 C   58    52.6

Example 2

Following snippet creates a sample data frame −

Department<-sample(c("Finance","Marketing","HR"),20,replace=TRUE)
Salary<-sample(20000:50000,20)
df2<-data.frame(Department,Salary)
df2

The following dataframe is created −

 Department   Salary
1   HR        20179
2   Finance   30870
3   Marketing 48467
4   HR        22429
5   Marketing 26829
6   Finance   34267
7   HR        27498
8   Marketing 38346
9   Finance   31771
10  HR        30438
11  HR        26913
12  HR        30034
13  Finance   26277
14  HR        43676
15  HR        45958
16  Finance   37371
17  HR        41023
18  HR        34838
19  Finance   44879
20  HR        20964

To find the groupwise cumulative mean for Salary in df2, add the following code to the above snippet −

df2 %>% group_by(Department) %>% mutate(Cum_Mean=cummean(Salary))
# A tibble: 20 x 3
# Groups: Department [3]

Output

If you execute all the above given snippets as a single program, it generates the following output −

  Department Salary  Cum_Mean
  <chr>      <int>   <dbl>
1  HR        20179   20179
2  Finance   30870   30870
3  Marketing 48467   48467
4  HR        22429   21304
5  Marketing 26829   37648
6  Finance   34267   32568.
7  HR        27498   23369.
8  Marketing 38346   37881.
9  Finance   31771   32303.
10 HR        30438   25136
11 HR        26913   25491.
12 HR        30034   26248.
13 Finance   26277   30796.
14 HR        43676   28738.
15 HR        45958   30891.
16 Finance   37371   32111.
17 HR        41023   32016.
18 HR        34838   32299.
19 Finance   44879   34239.
20 HR        20964   31268.
Updated on: 2021-11-05T08:23:29+05:30

332 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements