- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to find the sum by two factor columns in R?
To find the sum by two factor columns, we can use aggregate function. This is mostly required when we have frequency/count data for two factors. For example, if we have a data frame called df that contains two factor columns say f1 and f2 and one numerical column say Count then the sum of Count by f1 and f2 can be calculated by using the command aggregate(Count~f1+f2,data=df,sum).
Example
Consider the below data frame −
x1<-sample(LETTERS[1:4],20,replace=TRUE) x2<-sample(letters[1:2],20,replace=TRUE) freq<-rpois(20,5) df1<-data.frame(x1,x2,freq) df1
Output
x1 x2 freq 1 D b 4 2 C b 6 3 C b 7 4 A b 4 5 C b 4 6 C a 6 7 C b 2 8 C a 7 9 D b 4 10 D b 7 11 B a 3 12 B a 6 13 A a 5 14 B b 5 15 C a 2 16 B a 5 17 A b 6 18 B a 8 19 C a 2 20 D a 5
Finding sum of freq based on the combination of x1 and x2 −
Example
aggregate(freq~x1+x2,data=df1,sum)
Output
x1 x2 freq 1 A a 5 2 B a 22 3 C a 17 4 D a 5 5 A b 10 6 B b 5 7 C b 19 8 D b 15
Example
y1<-sample(c("hot","cold"),20,replace=TRUE) y2<-sample(c("Asia","America","Africa","Europe"),20,replace=TRUE) count<-sample(1:10,20,replace=TRUE) df2<-data.frame(y1,y2,count) df2
Output
y1 y2 count 1 hot Africa 8 2 cold America 8 3 hot America 4 4 hot Africa 5 5 hot America 3 6 hot Europe 9 7 cold Asia 10 8 hot America 4 9 hot America 6 10 cold Africa 5 11 cold America 5 12 cold America 10 13 cold Africa 3 14 cold America 5 15 hot Asia 9 16 cold Asia 4 17 cold America 6 18 cold Europe 4 19 cold Africa 9 20 hot Africa 8
Finding sum of count based on the combination of y1 and y2 −
Example
aggregate(count~y1+y2,data=df2,sum)
Output
y1 y2 count 1 cold Africa 17 2 hot Africa 21 3 cold America 34 4 hot America 17 5 cold Asia 14 6 hot Asia 9 7 cold Europe 4 8 hot Europe 9
Advertisements