How to remove rows for categorical columns that has three or less combination of duplicates in an R data frame?

In Data Analysis, we sometimes decide the size of the data or sample size based on our thoughts and this might result in removing some part of the data. One such thing could be removing three or less duplicate combinations of categorical columns and it can be done with the help of filter function of dplyr package by grouping with group_by function.

Example1

y1<−sample(c("S1","S2","S3","S4","S5","S6"),20,replace=TRUE)
y2<−sample(c("Winter","Summer"),20,replace=TRUE)
y3<−rnorm(20,3)
df2<−data.frame(y1,y2,y3)
df2

Output

y1 y2 y3
1 S1 Winter 2.683082
2 S4 Summer 1.141916
3 S6 Winter 3.371681
4 S2 Winter 3.191187
5 S3 Summer 2.195504
6 S5 Summer 2.631736
7 S3 Winter 3.303605
8 S6 Summer 3.074344
9 S5 Summer 2.663724
10 S5 Winter 2.281991
11 S6 Summer 4.174418
12 S4 Winter 6.081246
13 S4 Summer 3.202913
14 S2 Winter 5.557243
15 S2 Winter 3.747462
16 S2 Winter 2.621571
17 S2 Summer 3.909743
18 S5 Winter 2.325663
19 S5 Summer 3.749852
20 S5 Winter 2.331191

Example

df2%>%group_by(y1,y2)%>%filter(n()>=4)
# A tibble: 4 x 3
# Groups: y1, y2 [1]

Output

y1 y2 y3
<chr> <chr> <dbl>
1 S2 Winter 3.19
2 S2 Winter 5.56
3 S2 Winter 3.75
4 S2 Winter 2.62
Updated on: 2026-03-11T22:50:55+05:30

437 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements