How to select rows with group wise minimum or maximum values of a variable in an R data frame using dplyr?

R ProgrammingServer Side ProgrammingProgramming

If an R data frame contains a group variable that has many group levels then finding the minimum and maximum values of a discrete or continuous variable based on the group levels becomes difficult. But this can be done with slice function in dplyr package.

Consider the below data frame that has one group variable and continuous as well as discrete variables −

> set.seed(2)
> x1<-sample(1:100,20,replace=TRUE)
> x2<-sample(1:10,20,replace=TRUE)
> x3<-rpois(20,10)
> x4<-rpois(20,5)
> x5<-rpois(20,5)
> x6<-runif(20,2,5)
> x7<-sample(200:1000,20,replace=TRUE)
> Group<-rep(c(1,2,3,4),times=5)
> df<-data.frame(x1,x2,x3,x4,x5,x6,x7,Group)
> df
x1 x2 x3 x4 x5 x6 x7 Group
1 85 8 14 7 8 2.900301 749 1
2 79 7 12 4 3 3.331022 200 2
3 70 1 17 5 6 4.190603 883 3
4 6 6 11 8 5 4.004491 649 4
5 32 9 13 5 4 2.934971 641 1
6 8 4 7 3 6 3.435734 699 2
7 17 6 9 3 4 2.874230 679 3
8 93 9 7 1 3 2.546523 642 4
9 81 8 8 6 8 3.082288 496 1
10 76 6 7 4 1 4.711400 570 2
11 41 3 8 5 9 3.182143 847 3
12 50 9 9 7 4 4.339642 707 4
13 75 7 8 7 6 2.852477 805 1
14 65 8 16 8 5 4.561162 233 2
15 3 6 8 2 7 2.516727 783 3
16 80 2 9 9 3 2.237793 788 4
17 96 7 10 5 4 2.876195 792 1
18 50 2 7 1 3 4.521114 375 2
19 55 3 6 3 4 4.835804 942 3
20 63 4 9 10 6 2.134896 228 4

Loading the dplyr package −

> library(dplyr)

Finding the group wise rows for minimum and maximum values of a particular variable −

> df %>% group_by(Group) %>% slice(which.min(x5))
# A tibble: 4 x 8
# Groups: Group [4]
 x1     x2    x3    x4     x5   x6    x7   Group
 <int> <int> <int> <int> <int> <dbl> <int> <dbl>
1 32    9     13     5     4    2.93  641    1
2 76    6      7     4     1    4.71  570    2
3 17    6      9     3     4    2.87  679    3
4 93    9      7     1     3    2.55  642    4
> df %>% group_by(Group) %>% slice(which.max(x5))
# A tibble: 4 x 8
# Groups: Group [4]
   x1     x2    x3    x4    x5   x6     x7  Group
  <int> <int> <int> <int> <int> <dbl> <int> <dbl>
1 85      8     14    7     8    2.90  749    1
2  8      4      7    3     6    3.44  699    2
3 41      3      8    5     9    3.18  847    3
4 63      4      9   10     6    2.13  228    4
> df %>% group_by(Group) %>% slice(which.max(x7))
# A tibble: 4 x 8
# Groups: Group [4]
   x1    x2    x3    x4   x5     x6    x7  Group
 <int> <int> <int> <int> <int> <dbl> <int> <dbl>
1 75     7     8     7     6    2.85  805   1
2  8     4     7     3     6    3.44  699   2
3 55     3     6     3     4    4.84  942   3
4 80     2     9     9     3    2.24  788   4
> df %>% group_by(Group) %>% slice(which.min(x1))
# A tibble: 4 x 8
# Groups: Group [4]
  x1     x2    x3    x4    x5   x6     x7   Group
 <int> <int> <int> <int> <int> <dbl> <int> <dbl>
1 32     9    13     5    4    2.93   641   1
2  8     4     7     3    6    3.44   699   2
3  3     6     8     2    7    2.52   783   3
4  6     6    11     8    5    4.00   649   4
> df %>% group_by(Group) %>% slice(which.max(x1))
# A tibble: 4 x 8
# Groups: Group [4]
  x1   x2    x3     x4    x5    x6    x7   Group
<int> <int> <int> <int> <int> <dbl> <int> <dbl>
1 96    7    10     5     4    2.88   792  1
2 79    7    12     4     3    3.33   200  2
3 70    1    17     5     6    4.19   883  3
4 93    9     7     1     3    2.55   642  4

> df %>% group_by(Group) %>% slice(which.max(x6))
# A tibble: 4 x 8
# Groups: Group [4]
  x1    x2    x3    x4    x5   x6     x7   Group
<int> <int> <int> <int> <int> <dbl> <int> <dbl>
1 81    8     8     6     8    3.08  496  1
2 76    6     7     4     1    4.71  570  2
3 55    3     6     3     4    4.84  942  3
4 50    9     9     7     4    4.34  707  4
> df %>% group_by(Group) %>% slice(which.min(x6))
# A tibble: 4 x 8
# Groups: Group [4]
x1 x2 x3 x4 x5 x6 x7 Group
<int> <int> <int> <int> <int> <dbl> <int> <dbl>
1 75 7 8 7 6 2.85 805 1
2 79 7 12 4 3 3.33 200 2
3 3 6 8 2 7 2.52 783 3
4 63 4 9 10 6 2.13 228 4
raja
Updated on 11-Aug-2020 13:15:01

Advertisements