- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to find the sum based on a categorical variable in an R data frame?
Finding group-wise mean is a common thing but if we go for step-by-step analysis then sum of values are also required when we have a categorical variable in our data set. This can be easily done with the help of group_by and summarise_each function of dplyr package.
Example
Consider the below data frame:
> Group<-sample(LETTERS[1:4],20,replace=TRUE) > Salary<-sample(21000:50000,20) > Emp<-data.frame(Group,Salary) > Emp
Output
Group Salary 1 D 28256 2 B 31092 3 A 23147 4 C 28209 5 B 37676 6 C 33374 7 D 44864 8 B 40152 9 A 25843 10 A 40946 11 D 23321 12 A 42854 13 C 36960 14 A 35285 15 B 44478 16 B 36173 17 C 35077 18 A 39319 19 D 49204 20 A 41597
Loading dplyr package:
> library(dplyr)
Finding the sum of salaries in each group:
Example
> Emp %>% group_by(Group) %>% summarise_each(funs(sum)) # A tibble: 4 x 2
Output
Group Salary 1 A 248991 2 B 189571 3 C 133620 4 D 145645 Warning message: `...` is not empty.
We detected these problematic arguments:
* `needs_dots`
These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?
Do not worry about the warning message here, it occurred due to the version of dplyr. Our output is correct. If you want to ignore this warning then we need to install the latest version of tibble from CRAN. We are showing this here because you are likely to face the same problem but not necessarily.
Let’s have a look at another example:
Example
> x<-sample(c("India","USA","China","Canada"),20,replace=TRUE) > y<-rpois(20,5) > df<-data.frame(x,y) > df
Output
x y 1 India 5 2 India 3 3 India 4 4 China 5 5 Canada 3 6 USA 5 7 Canada 7 8 China 6 9 China 4 10 USA 7 11 China 5 12 India 8 13 Canada 3 14 India 6 15 Canada 5 16 Canada 3 17 Canada 7 18 USA 12 19 China 2 20 India 5
Example
> df %>% group_by(x) %>% summarise_each(funs(sum)) # A tibble: 4 x 2
Output
x y 1 Canada 28 2 China 22 3 India 31 4 USA 24 Warning message: `...` is not empty.
We detected these problematic arguments:
* `needs_dots`
These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?
- Related Articles
- How to subset an R data frame based on numerical and categorical column?
- How to filter data frame by categorical variable in R?
- How to create a frequency column for categorical variable in an R data frame?
- How to create a categorical variable using a data frame column in R?
- How to create a column with binary variable based on a condition of other variable in an R data frame?
- How to divide row values of a numerical column based on categorical column values in an R data frame?
- How to subset an R data frame with condition based on only one value from categorical column?
- How to find the sum of values based on key in other column of an R data frame?
- How to find the frequency based on intervals in R data frame?
- How to find the mean based on single group value in an R data frame?
- How to find the sum of rows of a column based on multiple columns in R data frame?
- How to convert all words of a string or categorical variable in an R data frame to uppercase?
- How to subset an R data frame based on small letters?
- How to find the counts of categories in categorical columns in an R data frame?
- How to find the cumulative sum of columns if some columns are categorical in R data frame?
