- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to find the sum by distinct column for factor levels in an R data frame?
If the data frame contains a factor column and some numerical columns then we might want to find the sum of numerical columns for the factor levels. For this purpose, we can use aggregate function. For example, if we have a data frame df that contains a factor column defined by Group and some numerical columns then the sum by distinct column for factor levels can be calculated by using aggregate(.~Group,data=df,sum)
Example1
Consider the below data frame −
Group<−factor(sample(c("A","B","C"),20,replace=TRUE)) frequency<−sample(1:10,20,replace=TRUE) cost<−round(rnorm(20,25,6),2) df1<−data.frame(Group,frequency,cost) df1
Output
Group frequency cost 1 A 6 21.69 2 C 5 34.94 3 C 3 17.32 4 B 3 16.84 5 A 10 23.10 6 C 3 30.30 7 B 8 19.84 8 A 1 25.41 9 C 2 27.55 10 A 10 26.31 11 B 7 33.05 12 A 10 32.09 13 B 1 27.36 14 A 9 19.70 15 A 5 26.44 16 A 10 28.28 17 C 6 25.67 18 A 9 24.06 19 C 3 22.25 20 A 5 24.93
Finding the sum of levels in Group for frequency and cost −
Example
aggregate(.~Group,data=df1,sum) Group frequency cost
Output
1 A 75 252.01 2 B 19 97.09 3 C 22 158.03
Example2
Class<−sample(c("First","Second","Third"),20,replace=TRUE) Price<−sample(2000:5000,20) Seats<−sample(0:9,20,replace=TRUE) df2<−data.frame(Class,Price,Seats) df2
Output
Class Price Seats 1 Third 2218 4 2 Second 3064 4 3 Third 4074 2 4 First 4394 4 5 First 2321 3 6 Third 4998 1 7 First 3520 2 8 First 4133 1 9 Third 4832 9 10 Second 2856 0 11 Third 3145 7 12 Third 4604 6 13 Second 4691 9 14 First 4994 4 15 Third 2252 2 16 First 3491 0 17 Second 4125 7 18 Second 2597 2 19 Third 3720 3 20 Second 2995 0
Finding the sum of levels in Class for Price and Seats −
Example
aggregate(.~Class,data=df2,sum)
Output
Class Price Seats 1 First 22853 14 2 Second 20328 22 3 Third 29843 34
Advertisements