How to create a data frame of the maximum value for each group in an R data frame using dplyr?

R ProgrammingServer Side ProgrammingProgramming

Sometimes subsetting of group wise maximum values is required while doing the data analysis and this subset of the data frame is used for comparative analysis. The main objective is to compare these maximums with each other or with a threshold value. In R, we can find the group wise maximum value by using group_by and slice functions in dplyr package.

Example

Consider the below data frame −

> x<-rep(c("S1","S2","S3","S4"),times=c(6,12,12,10))
> y<-1:40
> df<-data.frame(x,y)
> head(df,20)
    x  y
1  S1  1
2  S1  2
3  S1  3
4  S1  4
5  S1  5
6  S1  6
7  S2  7
8  S2  8
9  S2  9
10 S2 10
11 S2 11
12 S2 12
13 S2 13
14 S2 14
15 S2 15
16 S2 16
17 S2 17
18 S2 18
19 S3 19
20 S3 20

Loading dplyr package −

> library(dplyr)

Finding the data frame of maximum values using group_by and slice −

> Group_Maximum_df<-df%>%group_by(x)%>%slice(which.max(y))
> Group_Maximum_df
# A tibble: 4 x 2
# Groups: x [4]
x y
<fct> <int>
1 S1  6
2 S2 18
3 S3 30
4 S4 40

In the above example,, we have a factor variable. Now let’s consider that instead of a factor variable, our group variable is represented with integer values. In this case as well, the procedure to find the data frame of maximum values will be same −

> a<-rep(c(1,2,3,4,5),times=c(5,10,10,5,10))
> b<-rep(c(25,23,21,24),times=c(10,10,10,10))
> df2<-data.frame(a,b)
> head(df2,20)
   a  b
1  1 25
2  1 25
3  1 25
4  1 25
5  1 25
6  2 25
7  2 25
8  2 25
9  2 25
10 2 25
11 2 23
12 2 23
13 2 23
14 2 23
15 2 23
16 3 23
17 3 23
18 3 23
19 3 23
20 3 23
> Group_Maximum_df2<-df2%>%group_by(a)%>%slice(which.max(b))
> Group_Maximum_df2
# A tibble: 5 x 2
# Groups: a [5]
a b
<dbl> <dbl>
1   1 25
2   2 25
3   3 23
4   4 21
5   5 24
raja
Published on 10-Aug-2020 12:17:56
Advertisements