How to subset rows that contains maximum depending on another column in R data frame?

R ProgrammingServer Side ProgrammingProgramming

To subset rows that contains maximum depending on another column in R data frame, we can follow the below steps −

  • First of all, create a data frame with one numerical and one categorical column.
  • Then, use tapply function with max function to find the rows that contains maximum in numerical column based on another column.

Example1

Create the data frame

Let's create a data frame as shown below −

 Live Demo

x<-rnorm(20)
factor1<-sample(LETTERS[1:4],20,replace=TRUE)
df1<-data.frame(x,factor1)
df1

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

      x    factor1
1 -1.21231516 A
2 -0.01576519 B
3 0.59032593 D
4 -0.41583339 C
5 -0.38508102 A
6 -0.61177209 C
7 -0.52961795 C
8 0.30561837 A
9 -0.58067776 A
10 0.62246173 C
11 -0.58479709 C
12 0.09817433 B
13 1.11240042 C
14 0.29007306 B
15 -0.66345792 B
16 -1.80789902 A
17 0.33419804 C
18 -0.15665767 A
19 1.56775923 C
20 1.49345799 B

Find the rows that contains maximum based on another column

Using tapply function to find the maximum of rows in column x based on factor1 column in df1 −

 Live Demo

x<-rnorm(20)
factor1<-sample(LETTERS[1:4],20,replace=TRUE)
df1<-data.frame(x,factor1)
tapply(df1$x,df1$factor1,max)

Output

      A          B       C       D
0.3056184 1.4934580 1.5677592 0.5903259

Example 2

Create the data frame

Let's create a data frame as shown below −

 Live Demo

y<-sample(1:50,20)
factor2<-sample(c("Low","Medium","High"),20,replace=TRUE)
df2<-data.frame(y,factor2)
df2

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

y factor2
1 45 Low
2 2 Medium
3 5 High
4 33 Low
5 28 High
6 37 Medium
7 7 High
8 21 High
9 48 Low
10 18 High
11 15 High
12 38 High
13 20 Medium
14 4 Low
15 22 Medium
16 34 Low
17 32 Low
18 29 Low
19 24 High
20 17 Medium

Find the rows that contains maximum based on another column

Using tapply function to find the maximum of rows in column y based on factor2 column in df2 −

tapply(df2$y,df2$factor2,max)

Output

High Low Medium
38 48 37
raja
Published on 04-Aug-2021 12:11:46
Advertisements