How to select the first row for each level of a factor variable in an R data frame?


Comparison of rows is an influential part of data analysis, sometimes we compare variable with variable, value with value, case or row with another case or row, or even a complete data set with another data set. This is required to check the accuracy of data values and its consistency therefore we must do it. For this purpose, we need to select the required rows, columns etc. To select the first row for each level of a factor variable we can use duplicated function with ! sign.

Example

Consider the below data frame −

> x1<-rep(c(1,2,3,4,5,6,7,8,9,10),each=5)
> x2<-1:50
> x3<-rep(c(LETTERS[1:10]),times=5)
> df<-data.frame(x1,x2,x3)
> head(df,20)
  x1 x2 x3
1  1  1 A
2  1  2 B
3  1  3 C
4  1  4 D
5  1  5 E
6  2  6 F
7  2  7 G
8  2  8 H
9  2  9 I
10 2 10 J
11 3 11 A
12 3 12 B
13 3 13 C
14 3 14 D
15 3 15 E
16 4 16 F
17 4 17 G
18 4 18 H
19 4 19 I
20 4 20 J
> tail(df,20)
x1 x2 x3
31 7 31 A
32 7 32 B
33 7 33 C
34 7 34 D
35 7 35 E
36 8 36 F
37 8 37 G
38 8 38 H
39 8 39 I
40 8 40 J
41 9 41 A
42 9 42 B
43 9 43 C
44 9 44 D
45 9 45 E
46 10 46 F
47 10 47 G
48 10 48 H
49 10 49 I
50 10 50 J

Selecting first rows based on each level of factor variable x1 −

> df[!duplicated(df$x1),]
x1 x2 x3
1 1 1 A
6 2 6 F
11 3 11 A
16 4 16 F
21 5 21 A
26 6 26 F
31 7 31 A
36 8 36 F
41 9 41 A
46 10 46 F

Selecting first rows based on each level of factor variable x3 −

> df[!duplicated(df$x3),]
x1 x2 x3
1 1 1 A
2 1 2 B
3 1 3 C
4 1 4 D
5 1 5 E
6 2 6 F
7 2 7 G
8 2 8 H
9 2 9 I
10 2 10 J

Updated on: 11-Aug-2020

390 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements