How to create a data frame of the maximum value for each group in an R data frame using dplyr?


Sometimes subsetting of group wise maximum values is required while doing the data analysis and this subset of the data frame is used for comparative analysis. The main objective is to compare these maximums with each other or with a threshold value. In R, we can find the group wise maximum value by using group_by and slice functions in dplyr package.

Example

Consider the below data frame −

> x<-rep(c("S1","S2","S3","S4"),times=c(6,12,12,10))
> y<-1:40
> df<-data.frame(x,y)
> head(df,20)
    x  y
1  S1  1
2  S1  2
3  S1  3
4  S1  4
5  S1  5
6  S1  6
7  S2  7
8  S2  8
9  S2  9
10 S2 10
11 S2 11
12 S2 12
13 S2 13
14 S2 14
15 S2 15
16 S2 16
17 S2 17
18 S2 18
19 S3 19
20 S3 20

Loading dplyr package −

> library(dplyr)

Finding the data frame of maximum values using group_by and slice −

> Group_Maximum_df<-df%>%group_by(x)%>%slice(which.max(y))
> Group_Maximum_df
# A tibble: 4 x 2
# Groups: x [4]
x y
<fct> <int>
1 S1  6
2 S2 18
3 S3 30
4 S4 40

In the above example,, we have a factor variable. Now let’s consider that instead of a factor variable, our group variable is represented with integer values. In this case as well, the procedure to find the data frame of maximum values will be same −

> a<-rep(c(1,2,3,4,5),times=c(5,10,10,5,10))
> b<-rep(c(25,23,21,24),times=c(10,10,10,10))
> df2<-data.frame(a,b)
> head(df2,20)
   a  b
1  1 25
2  1 25
3  1 25
4  1 25
5  1 25
6  2 25
7  2 25
8  2 25
9  2 25
10 2 25
11 2 23
12 2 23
13 2 23
14 2 23
15 2 23
16 3 23
17 3 23
18 3 23
19 3 23
20 3 23
> Group_Maximum_df2<-df2%>%group_by(a)%>%slice(which.max(b))
> Group_Maximum_df2
# A tibble: 5 x 2
# Groups: a [5]
a b
<dbl> <dbl>
1   1 25
2   2 25
3   3 23
4   4 21
5   5 24

Updated on: 10-Aug-2020

348 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements