How to find the unique rows in an R data frame?

R Programming Server Side Programming Programming

A unique row in an R data frame means that all the elements in that row are not repeated with the same combination in the whole data frame. In simple words, we can say that if we have a data frame called df that contains 3 columns and 5 rows then all the values in a particular row are not repeated for any other row. The search of this type of rows might be required when we have a lot of duplicate rows in our data set. To do this, we can use group_by_all function of dplyr package as shown in the below examples.

Example1

Consider the below data frame −

Live Demo

> x1<-rpois(20,1)
> x2<-rpois(20,1)
> x3<-rpois(20,1)
> df1<-data.frame(x1,x2,x3)
> df1

Output

Loading dplyr package and finding unique rows in df1 −

> library(dplyr)
> df1%>%group_by_all%>%count
# A tibble: 14 x 4
# Groups: x1, x2, x3 [14]

Output

    x1    x2    x3    n
  <int> <int> <int> <int>
1   0     0     0     2
2   0     0     1     1
3   0     0     2     1
4   0     1     0     2
5   0     1     1     2
6   1     0     1     3
7   1     0     2     1
8   1     1     1     2
9   1     2     1     1
10  2     0     2     1
11  2     1     2     1
12  2     2     0     1
13  2     2     2     1
14  4     2     0     1

Example2

Live Demo

> y1<-sample(c("Yes","No"),20,replace=TRUE)
> y2<-sample(c("Yes","No"),20,replace=TRUE)
> df2<-data.frame(y1,y2)
> df2

Output

  y1   y2
1 No   Yes
2 No   Yes
3 No   No
4 Yes  No
5 No   No
6 Yes  Yes
7 No   No
8 Yes  Yes
9 No   No
10 No  No
11 No  Yes
12 No  Yes
13 Yes No
14 No  Yes
15 No  No
16 Yes No
17 Yes No
18 No  Yes
19 No  Yes
20 Yes No

Finding unique rows in df2 −

> df2%>%group_by_all%>%count
# A tibble: 4 x 3
# Groups: y1, y2 [4]

Output

   y1      y2     n
  <int>  <int> <int>
1  No     No     6
2  No     Yes    7
3  Yes    No     5
4  Yes    Yes    2

Example3

Live Demo

> z1<-sample(1:4,20,replace=TRUE)
> z2<-sample(1:4,20,replace=TRUE)
> df3<-data.frame(z1,z2)
> df3

Output

Finding unique rows in df3 −

> df3%>%group_by_all%>%count
# A tibble: 10 x 3
# Groups: z1, z2 [10]
z1 z2 n

Output

  <int> <int> <int>
1   1     3     4
2   1     4     3
3   2     1     2
4   2     3     2
5   3     2     1
6   3     3     1
7   4     1     2
8   4     2     2
9   4     3     1
10  4     4     2

Nizamuddin Siddiqui

Updated on: 04-Mar-2021

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started