Subset groups that occur greater than equal to n times in R dataframe.


To subset groups that occur less than n times in R data frame, we can use filter function of dplyr package.

For Example, if we have a data frame called df that contains a grouping column say Group then we can subset groups that occur less than 4 times by using the below mentioned command −

df%%group_by(Group)%%filter(n()=4)

Example 1

Following snippet creates a sample data frame −

Grp<-sample(LETTERS[1:3],20,replace=TRUE)
Response<-rpois(20,10)
df1<-data.frame(Grp,Response)
df1

The following dataframe is created

  Grp Response
 1 B  7
 2 A 12
 3 A  9
 4 C 11
 5 B  9
 6 B  7
 7 A  5
 8 C  5 
 9 A  6
10 A 12
11 A  4
12 A 11
13 C 13
14 A 17
15 A 12
16 B  9
17 C  4
18 B 11
19 A  7
20 B 10

To load dplyr package and subset df1 based on grouping column Grp that occur greater than equal to 6 times on the above created data frame, add the following code to the above snippet −

Grp<-sample(LETTERS[1:3],20,replace=TRUE)
Response<-rpois(20,10)
df1<-data.frame(Grp,Response)
library(dplyr)
df1%%group_by(Grp)%%filter(n()=6)
# A tibble: 16 x 2
# Groups: Grp [2]

Output

If you execute all the above given snippets as a single program, it generates the following Output −

 Grp  Response
 <chr <int
 1 B  7
 2 A 12
 3 A  9
 4 B  9
 5 B  7
 6 A  5
 7 A  6
 8 A 12
 9 A  4
10 A 11
11 A 17
12 A 12
13 B  9
14 B 11
15 A  7
16 B 10

Example 2

Following snippet creates a sample data frame −

Class<-sample(c("First","Second","Third"),20,replace=TRUE)
Price<-sample(20:50,20)
df2<-data.frame(Class,Price)
df2

The following dataframe is created

  Class  Price
 1 First  45
 2 Third  41
 3 First  42
 4 Second 30
 5 First  31
 6 Second 28
 7 Third  24
 8 Third  39
 9 Third  44
10 Second 38
11 Third  37
12 Second 49
13 Third  23
14 Third  33
15 First  20
16 Second 36
17 Second 27
18 First  21
19 First  47
20 Third  34

To subset df2 based on grouping column Class that occur greater than equal to 8 times on the above created data frame, add the following code to the above snippet −

Class<-sample(c("First","Second","Third"),20,replace=TRUE)
Price<-sample(20:50,20)
df2<-data.frame(Class,Price)
df2%%group_by(Class)%%filter(n()=8)
# A tibble: 8 x 2
# Groups: Class [1]

Output

If you execute all the above given snippets as a single program, it generates the following Output −

  Class Price
  <chr <int
1 Third 41
2 Third 24
3 Third 39
4 Third 44
5 Third 37
6 Third 23
7 Third 33
8 Third 34

Updated on: 08-Nov-2021

378 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements