- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to find the cumulative sum for factor levels in an R data frame?
Cumulative sums are mostly used in descriptive analysis of data but sometimes we might want to calculate them in understanding the time series analysis for moving sums but it is very rare. If we have a factor column in an R data frame then it would not make sense to find the cumulative sum for all factor levels together, we must find the cumulative sums for each level. This can be easily done by using ave function.
Example
Consider the below data frame −
set.seed(15) x1<-as.factor(sample(LETTERS[1:3],20,replace=TRUE)) x2<-rpois(20,5) df1<-data.frame(x1,x2) df1
Output
x1 x2 1 A 5 2 C 6 3 B 2 4 B 3 5 A 8 6 B 5 7 A 5 8 A 3 9 C 5 10 A 2 11 C 5 12 B 6 13 A 2 14 A 5 15 A 4 16 C 6 17 B 8 18 A 5 19 B 8 20 B 3
Finding cumulative sum of x2 values for different values of x1 −
Example
df1$CumSum_x2_based_on_x1<-ave(x2,x1,FUN=cumsum) df1
Output
x1 x2 CumSum_x2_based_on_x1 1 A 5 5 2 C 6 6 3 B 2 2 4 B 3 5 5 A 8 13 6 B 5 10 7 A 5 18 8 A 3 21 9 C 5 11 10 A 2 23 11 C 5 16 12 B 6 16 13 A 2 25 14 A 5 30 15 A 4 34 16 C 6 22 17 B 8 24 18 A 5 39 19 B 8 32 20 B 3 35
Let’s have a look at another example −
Example
Group<-sample(c("GRP1","GRP2","GRP3","GRP4"),20,replace=TRUE) Response<-sample(1:10,20,replace=TRUE) df2<-data.frame(Group,Response) df2
Output
Group Response 1 GRP2 1 2 GRP3 1 3 GRP2 8 4 GRP2 1 5 GRP2 4 6 GRP1 7 7 GRP1 8 8 GRP1 2 9 GRP1 1 10 GRP1 1 11 GRP4 3 12 GRP3 9 13 GRP4 4 14 GRP1 9 15 GRP4 5 16 GRP2 8 17 GRP2 10 18 GRP3 5 19 GRP3 8 20 GRP3 8
Finding cumulative sum of Response values for different values of Group
Example
df2$CumSum_of_GroupLevels<-ave(Response,Group,FUN=cumsum) df2
Output
Group Response CumSum_of_GroupLevels 1 GRP2 1 1 2 GRP3 1 1 3 GRP2 8 9 4 GRP2 1 10 5 GRP2 4 14 6 GRP1 7 7 7 GRP1 8 15 8 GRP1 2 17 9 GRP1 1 18 10 GRP1 1 19 11 GRP4 3 3 12 GRP3 9 10 13 GRP4 4 7 14 GRP1 9 28 15 GRP4 5 12 16 GRP2 8 22 17 GRP2 10 32 18 GRP3 5 15 19 GRP3 8 23 20 GRP3 8 31
Advertisements