How to subset rows of an R data frame based on duplicate values in a particular column?

R ProgrammingServer Side ProgrammingProgramming

Duplication is also a problem that we face during data analysis. We can find the rows with duplicated values in a particular column of an R data frame by using duplicated function inside the subset function. This will return only the duplicate rows based on the column we choose that means the first unique value will not be in the output.

Example

 Live Demo

Consider the below data frame:
x1<-1:20
x2<-rpois(20,4)
df1<-data.frame(x1,x2)
df1

Output

  x1 x2
1  1 7
2  2 6
3  3 2
4  4 6
5  5 1
6  6 7
7  7 5
8  8 2
9  9 2
10 10 2
11 11 3
12 12 2
13 13 1
14 14 3
15 15 3
16 16 3
17 17 5
18 18 5
19 19 7
20 20 3

Create rows of df1 based on duplicates in column x2 −

Example

subset(df1,duplicated(x2))

Output

  x1 x2
4   4 6
6   6 7
8   8 2
9   9 2
10 10 2
12 12 2
13 13 1
14 14 3
15 15 3
16 16 3
17 17 5
18 18 5
19 19 7
20 20 3

Example

 Live Demo

y1<-LETTERS[1:20]
y2<-sample(0:5,20,replace=TRUE)
df2<-data.frame(y1,y2)
df2

Output

  y1 y2
1  A 5
2  B 4
3  C 1
4  D 2
5  E 3
6  F 4
7  G 1
8  H 4
9  I 3
10 J 1
11 K 5
12 L 5
13 M 0
14 N 3
15 O 5
16 P 0
17 Q 1
18 R 4
19 S 2
20 T 3

Create rows of df2 based on duplicates in column y2 −

Example

subset(df2,duplicated(y2))

Output

  y1 y2
6  F 4
7  G 1
8  H 4
9  I 3
10 J 1
11 K 5
12 L 5
14 N 3
15 O 5
16 P 0
17 Q 1
18 R 4
19 S 2
20 T 3
raja
Published on 05-Dec-2020 13:04:37
Advertisements