- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to collapse data frame rows in R by summing using dplyr?
To collapse data frame rows by summing using dplyr package, we can use summarise_all function of dplyr package. For example, if we have a data frame called df that has a categorical column say Group and one numerical column then collapsing of rows by summing can be done by using the command −
df%>%group_by(Group)%>%summarise_all(funs(sum))
Consider the below data frame −
Example
Group<-sample(LETTERS[1:6],25,replace=TRUE) Response<-rnorm(25,3,0.24) df1<-data.frame(Group,Response) df1
Output
Group Response 1 F 2.920793 2 C 2.898450 3 C 3.347825 4 A 3.174100 5 B 3.089882 6 C 2.918084 7 D 3.274836 8 F 2.709450 9 F 3.349442 10 F 2.995712 11 C 3.081089 12 A 3.123781 13 C 2.947828 14 C 3.096281 15 E 2.990183 16 E 3.061462 17 C 3.279717 18 C 2.932549 19 F 2.772635 20 B 2.993549 21 F 2.956203 22 E 2.920117 23 F 3.244469 24 F 3.376968 25 B 3.072305
Loading dplyr package and summing the rows of df1 based on Group column −
Example
library(dplyr) df1%>%group_by(Group)%>%summarise_all(funs(sum)) # A tibble: 6 x 2
Output
Group Response <chr> <dbl> 1 A 6.30 2 B 9.16 3 C 24.5 4 D 3.27 5 E 8.97 6 F 24.3
Example
Region<-sample(c("Asia","Oceania","Africa","America"),25,replace=TRUE) Y<-rpois(25,5) df2<-data.frame(Region,Y) df2
Output
Region Y 1 Asia 2 2 America 4 3 Africa 4 4 Oceania 5 5 America 6 6 Asia 5 7 Oceania 9 8 Asia 4 9 Asia 4 10 America 5 11 Oceania 2 12 Africa 6 13 Asia 4 14 Oceania 7 15 Africa 7 16 America 2 17 Asia 5 18 Africa 2 19 America 3 20 Oceania 3 21 Oceania 4 22 Africa 5 23 Africa 5 24 Africa 5 25 America 6
Summing the rows of df2 based on Region column −
Example
df2%>%group_by(Region)%>%summarise_all(funs(sum)) # A tibble: 4 x 2
Output
Region Y <chr> <int> 1 Africa 34 2 America 26 3 Asia 24 4 Oceania 30
Advertisements