How to subset non-duplicate values from an R data frame column?


Generally, the duplicate values are considered after first occurrence but the first occurrence of a value is also a duplicate of the remaining. Therefore, we might want to exclude that as well.

The subsetting of non-duplicate values from an R data frame column can be easily done with the help of duplicated function with negation operator as shown in the below Examples.

Example 1

Following snippet creates a sample data frame −

x<-rpois(20,10)
df1<-data.frame(x)
df1

The following dataframe is created

    x
1  16
2   5
3  17
4   7
5   6
6   7
7  14
8  10
9   7
10 13
11 11
12 15
13  4
14 10
15 16
16 11
17 10
18 11
19  9
20 11

To subset the non-duplicate values from x with exclusion of first duplicate on the above created data frame, add the following code to the above snippet −

x<-rpois(20,10)
df1<-data.frame(x)
df1$x[!(duplicated(df1$x)|duplicated(df1$x,fromLast=TRUE))]

Output

If you execute all the above given snippets as a single program, it generates the following Output −

[1] 5 17 6 14 13 15 4 9

Example 2

Following snippet creates a sample data frame −

y<-sample(1:10,20,replace=TRUE)
df2<-data.frame(y)
df2

The following dataframe is created

    y
1   8
2  10
3   1
4   5
5   5
6   2
7   1
8   2
9   6
10  7
11 10
12  5
13  7
14  4
15  2
16  1
17  6
18  5
19 10
20  7

To subset the non-duplicate values from y with exclusion of first duplicate on the above created data frame, add the following code to the above snippet −

y<-sample(1:10,20,replace=TRUE)
df2<-data.frame(y)
df2
df2$y[!(duplicated(df2$y)|duplicated(df2$y,fromLast=TRUE))]

Output

If you execute all the above given snippets as a single program, it generates the following Output −

[1] 8 4

Example 3

Following snippet creates a sample data frame −

z<-sample(501:510,20,replace=TRUE)
df3<-data.frame(z)
df3

The following dataframe is created

     z
1  509
2  507
3  504
4  508
5  502
6  510
7  508
8  506
9  503
10 508
11 507
12 508
13 502
14 508
15 506
16 510
17 505
18 510
19 510
20 505

To subset the non-duplicate values from y with exclusion of first duplicate on the above created data frame, add the following code to the above snippet −

z<-sample(501:510,20,replace=TRUE)
df3<-data.frame(z)
df3$z[!(duplicated(df3$z)|duplicated(df3$z,fromLast=TRUE))]

Output

If you execute all the above given snippets as a single program, it generates the following Output −

[1] 509 504 503

Updated on: 01-Nov-2021

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements