# How to subset rows that contains maximum depending on another column in R data frame?

To subset rows that contains maximum depending on another column in R data frame, we can follow the below steps −

• First of all, create a data frame with one numerical and one categorical column.
• Then, use tapply function with max function to find the rows that contains maximum in numerical column based on another column.

## Example1

Create the data frame

Let's create a data frame as shown below −

x<-rnorm(20)
factor1<-sample(LETTERS[1:4],20,replace=TRUE)
df1<-data.frame(x,factor1)
df1

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

      x    factor1
1 -1.21231516 A
2 -0.01576519 B
3 0.59032593 D
4 -0.41583339 C
5 -0.38508102 A
6 -0.61177209 C
7 -0.52961795 C
8 0.30561837 A
9 -0.58067776 A
10 0.62246173 C
11 -0.58479709 C
12 0.09817433 B
13 1.11240042 C
14 0.29007306 B
15 -0.66345792 B
16 -1.80789902 A
17 0.33419804 C
18 -0.15665767 A
19 1.56775923 C
20 1.49345799 B

## Find the rows that contains maximum based on another column

Using tapply function to find the maximum of rows in column x based on factor1 column in df1 −

x<-rnorm(20)
factor1<-sample(LETTERS[1:4],20,replace=TRUE)
df1<-data.frame(x,factor1)
tapply(df1$x,df1$factor1,max)

### Output

      A          B       C       D
0.3056184 1.4934580 1.5677592 0.5903259

### Example 2

Create the data frame

Let's create a data frame as shown below −

y<-sample(1:50,20)
factor2<-sample(c("Low","Medium","High"),20,replace=TRUE)
df2<-data.frame(y,factor2)
df2

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

y factor2
1 45 Low
2 2 Medium
3 5 High
4 33 Low
5 28 High
6 37 Medium
7 7 High
8 21 High
9 48 Low
10 18 High
11 15 High
12 38 High
13 20 Medium
14 4 Low
15 22 Medium
16 34 Low
17 32 Low
18 29 Low
19 24 High
20 17 Medium

## Find the rows that contains maximum based on another column

Using tapply function to find the maximum of rows in column y based on factor2 column in df2 −

tapply(df2$y,df2$factor2,max)

### Output

High Low Medium
38 48 37