How to subset an R data frame with condition based on only one value from categorical column?


To subset an R data frame with condition based on only one value from categorical column, we can follow the below steps −

  • First of all, create a data frame.
  • Then, subset the data frame with condition using filter function of dplyr package.

Create the data frame

Let's create a data frame as shown below −

 Live Demo

Class<-sample(c("First","Second","Third","Fourth"),25,replace=TRUE)
x<-sample(1:10,25,replace=TRUE)
y<-sample(1:10,25,replace=TRUE)
z<-sample(1:10,25,replace=TRUE)
df<-data.frame(Class,x,y,z)
df

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

Class x y z
1 Fourth 10 6 7
2 First 10 1 5
3 Third 3 5 9
4 First 2 8 5
5 Third 4 9 9
6 First 2 5 3
7 Second 2 7 7
8 Third 6 4 4
9 First 2 9 3
10 First 10 7 4
11 Fourth 1 9 3
12 First 8 7 8
13 First 7 5 3
14 First 10 4 2
15 First 8 9 2
16 First 9 9 10
17 Third 1 1 10
18 Third 5 9 6
19 First 3 2 9
20 Third 8 5 4
21 Third 9 2 7
22 Second 5 9 3
23 Third 10 3 6
24 First 10 6 9
25 Third 1 10 4

Subset the data frame with condition based on a categorical column

Using filter function to subset df when x is greater than 5 and Class is First −

Class<-sample(c("First","Second","Third","Fourth"),25,replace=TRUE)
x<-sample(1:10,25,replace=TRUE)
y<-sample(1:10,25,replace=TRUE)
z<-sample(1:10,25,replace=TRUE)
df<-data.frame(Class,x,y,z)
library(dplyr)
df %>% group_by(Class) %>% filter(x>5 & Class=="First")

Output

# A tibble: 8 x 4
# Groups: Class [1]
Class    x      y    z
 <chr> <int> <int> <int>
1 First 10    1    5
2 First 10    7    4
3 First 8     7    8
4 First 7     5    3
5 First 10    4    2
6 First 8     9    2
7 First 9     9    10
8 First 10    6    9

Subset the data frame with condition based on a categorical column

Using filter function to subset df when y is greater than 5 and Class is First −

Class<-sample(c("First","Second","Third","Fourth"),25,replace=TRUE)
x<-sample(1:10,25,replace=TRUE)
y<-sample(1:10,25,replace=TRUE)
z<-sample(1:10,25,replace=TRUE)
df<-data.frame(Class,x,y,z)
library(dplyr)
df %>% group_by(Class) %>% filter(y>5 & Class=="First")

Output

# A tibble: 7 x 4
# Groups: Class [1]
Class    x     y    z
 <chr> <int> <int> <int>
1 First 2     8    5
2 First 2     9    3
3 First 10    7    4
4 First 8     7    8  
5 First 8     9    2
6 First 9     9    10
7 First 10    6    9

Subset the data frame with condition based on a categorical column

Using filter function to subset df when z is greater than 5 and Class is First −

Class<-sample(c("First","Second","Third","Fourth"),25,replace=TRUE)
x<-sample(1:10,25,replace=TRUE)
y<-sample(1:10,25,replace=TRUE)
z<-sample(1:10,25,replace=TRUE)
df<-data.frame(Class,x,y,z)
library(dplyr)
df %>% group_by(Class) %>% filter(z>5 & Class=="First")

Output

# A tibble: 4 x 4
# Groups: Class [1]
Class    x    y    z
<chr> <int> <int> <int>
1 First 8    7    8
2 First 9    9    10
3 First 3    2    9
4 First 10   6    9

Updated on: 14-Aug-2021

579 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements