- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to remove only the first duplicate row by group in an R data frame?
To remove only the first duplicate row by group, we can use filter function of dplyr package with duplicated function.
For example, if we have a data frame called df that contains a grouping column say Grp then removal of only first duplicate row by group can be done by using the below command as follows −
df%>%group_by(Grp)%>%filter(duplicated(Grp)|n()==1)
Example 1
Following snippet creates a sample data frame −
Group<-sample(LETTERS[1:4],20,replace=TRUE) Response<-rpois(20,5) df1<-data.frame(Group,Response) df1
Output
The following dataframe is created −
Group Response 1 D 9 2 A 3 3 B 4 4 A 5 5 B 8 6 B 8 7 D 2 8 D 5 9 B 4 10 C 4 11 D 7 12 D 5 13 C 5 14 A 2 15 B 5 16 A 9 17 B 6 18 C 8 19 D 3 20 A 7
To load dplyr package and remove only first duplicate row from each group in df1, add the following code to the above snippet −
library(dplyr) df1%>%group_by(Group)%>%filter(duplicated(Group)|n()==1) # A tibble: 16 x 2 # Groups: Group [4]
Output
If you execute all the above given codes as a single program, it generates the following output −
Group Response <chr> <int> 1 A 5 2 B 8 3 B 8 4 D 2 5 D 5 6 B 4 7 D 7 8 D 5 9 C 5 10 A 2 11 B 5 12 A 9 13 B 6 14 C 8 15 D 3 16 A 7
Example 2
Following snippet creates a sample data frame −
Category<-sample(c("First","Second","Third"),20,replace=TRUE) Rank<-sample(1:10,20,replace=TRUE) df2<-data.frame(Category,Rank) df2
Output
The following dataframe is created −
Category Rank 1 Second 10 2 Second 5 3 Second 4 4 Third 3 5 Second 5 6 Second 9 7 First 6 8 Second 10 9 First 9 10 Third 1 11 First 8 12 Second 3 13 Second 5 14 Third 1 15 Third 2 16 Second 4 17 Second 6 18 Third 6 19 Second 2 20 Second 9
To remove only first duplicate row from each group in df2, add the following code to the above snippet −
df2%>%group_by(Category)%>%filter(duplicated(Category)|n()==1) # A tibble: 17 x 2 # Groups: Category [3]
Output
If you execute all the above given codes as a single program, it generates the following output −
Category Rank <chr> <int> 1 Second 5 2 Second 4 3 Second 5 4 Second 9 5 Second 10 6 First 9 7 Third 1 8 First 8 9 Second 3 10 Second 5 11 Third 1 12 Third 2 13 Second 4 14 Second 6 15 Third 6 16 Second 2 17 Second 9