How to find the cumulative sums by using two factor columns in an R data frame?


Generally, cumulative sums are calculated for a single variable and in some cases based on a single categorical variable, there are very few situations when we want to do it for two categorical variables. If we want to find it for two categorical variables then we need to convert the data frame to a data.table object and use the cumsum function to define the column with cumulative sums.

Example

Consider the below data frame:

Live Demo

> set.seed(1361)
> Factor1<-as.factor(sample(LETTERS[1:4],20,replace=TRUE))
> Factor2<-as.factor(sample(c("T1","T2","T3","T4"),20,replace=TRUE))
> Response<-rpois(20,5)
> df1<-data.frame(Factor1,Factor2,Response)
> df1

Output

Factor1 Factor2 Response
1 A T2 9
2 B T1 8
3 B T1 2
4 A T2 3
5 B T1 7
6 B T2 7
7 D T2 7
8 D T4 7
9 C T4 6
10 B T1 6
11 A T2 4
12 A T2 4
13 C T1 7
14 B T3 1
15 A T3 6
16 D T1 3
17 B T1 8
18 D T4 5
19 D T2 3
20 C T1 4

Loading data.table package:

> library(data.table)

Converting data frame df1 to data.table object:

> dt1<-data.table(df1)

Creating a column CumulativeSums with cumulative sums based on Factor1 and Factor2:

Example

> dt1[,CumulativeSums:=cumsum(Response),by=list(Factor1,Factor2)]
> dt1

Output

Factor1 Factor2 Response CumulativeSums
1: A T2 9 9
2: B T1 8 8
3: B T1 2 10
4: A T2 3 12
5: B T1 7 17
6: B T2 7 7
7: D T2 7 7
8: D T4 7 7
9: C T4 6 6
10: B T1 6 23
11: A T2 4 16
12: A T2 4 20
13: C T1 7 7
14: B T3 1 1
15: A T3 6 6
16: D T1 3 3
17: B T1 8 31
18: D T4 5 12
19: D T2 3 10
20: C T1 4 11

Let’s have a look at another example:

Example

Live Demo

> G1<-as.factor(sample(c("Hot","Cold"),20,replace=TRUE))
> G2<-as.factor(sample(c("Low","Medium","Large"),20,replace=TRUE))
> Y<-sample(1:100,20)
> df2<-data.frame(G1,G2,Y)
> df2

Output

G1 G2 Y
1 Hot Medium 60
2 Cold Low 94
3 Hot Low 22
4 Cold Medium 90
5 Hot Medium 16
6 Hot Large 32
7 Cold Low 44
8 Hot Low 73
9 Hot Medium 99
10 Hot Medium 68
11 Cold Medium 41
12 Cold Large 77
13 Cold Large 48
14 Cold Medium 20
15 Cold Medium 18
16 Cold Low 12
17 Cold Low 30
18 Hot Low 23
19 Cold Medium 26
20 Cold Medium 4

Example

> dt2<-data.table(df2)
> dt2[,CumulativeSums:=cumsum(Y),by=list(G1,G2)]
> dt2

Output

G1 G2 Y CumulativeSums
1: Hot Medium 60 60
2: Cold Low 94 94
3: Hot Low 22 22
4: Cold Medium 90 90
5: Hot Medium 16 76
6: Hot Large 32 32
7: Cold Low 44 138
8: Hot Low 73 95
9: Hot Medium 99 175
10: Hot Medium 68 243
11: Cold Medium 41 131
12: Cold Large 77 77
13: Cold Large 48 125
14: Cold Medium 20 151
15: Cold Medium 18 169
16: Cold Low 12 150
17: Cold Low 30 180
18: Hot Low 23 118
19: Cold Medium 26 195
20: Cold Medium 4 199

Updated on: 07-Nov-2020

240 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements