How to find the sum based on a categorical variable in an R data frame?


Finding group-wise mean is a common thing but if we go for step-by-step analysis then sum of values are also required when we have a categorical variable in our data set. This can be easily done with the help of group_by and summarise_each function of dplyr package.

Example

Consider the below data frame:

Live Demo

> Group<-sample(LETTERS[1:4],20,replace=TRUE)
> Salary<-sample(21000:50000,20)
> Emp<-data.frame(Group,Salary)
> Emp

Output

Group Salary
1 D 28256
2 B 31092
3 A 23147
4 C 28209
5 B 37676
6 C 33374
7 D 44864
8 B 40152
9 A 25843
10 A 40946
11 D 23321
12 A 42854
13 C 36960
14 A 35285
15 B 44478
16 B 36173
17 C 35077
18 A 39319
19 D 49204
20 A 41597

Loading dplyr package:

> library(dplyr)

Finding the sum of salaries in each group:

Example

> Emp %>% group_by(Group) %>% summarise_each(funs(sum))
# A tibble: 4 x 2

Output

Group Salary
1 A 248991
2 B 189571
3 C 133620
4 D 145645
Warning message:
`...` is not empty.

We detected these problematic arguments:

* `needs_dots`

These dots only exist to allow future extensions and should be empty.

Did you misspecify an argument?

Do not worry about the warning message here, it occurred due to the version of dplyr. Our output is correct. If you want to ignore this warning then we need to install the latest version of tibble from CRAN. We are showing this here because you are likely to face the same problem but not necessarily.

Let’s have a look at another example:

Example

Live Demo

> x<-sample(c("India","USA","China","Canada"),20,replace=TRUE)
> y<-rpois(20,5)
> df<-data.frame(x,y)
> df

Output

x y
1 India 5
2 India 3
3 India 4
4 China 5
5 Canada 3
6 USA 5
7 Canada 7
8 China 6
9 China 4
10 USA 7
11 China 5
12 India 8
13 Canada 3
14 India 6
15 Canada 5
16 Canada 3
17 Canada 7
18 USA 12
19 China 2
20 India 5

Example

> df %>% group_by(x) %>% summarise_each(funs(sum))
# A tibble: 4 x 2

Output

x y
1 Canada 28
2 China 22
3 India 31
4 USA 24
Warning message:
`...` is not empty.

We detected these problematic arguments:

* `needs_dots`

These dots only exist to allow future extensions and should be empty.

Did you misspecify an argument?

Updated on: 19-Nov-2020

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements