# How to select the first row for each level of a factor variable in an R data frame?

R ProgrammingServer Side ProgrammingProgramming

Comparison of rows is an influential part of data analysis, sometimes we compare variable with variable, value with value, case or row with another case or row, or even a complete data set with another data set. This is required to check the accuracy of data values and its consistency therefore we must do it. For this purpose, we need to select the required rows, columns etc. To select the first row for each level of a factor variable we can use duplicated function with ! sign.

## Example

Consider the below data frame −

> x1<-rep(c(1,2,3,4,5,6,7,8,9,10),each=5)
> x2<-1:50
> x3<-rep(c(LETTERS[1:10]),times=5)
> df<-data.frame(x1,x2,x3)
x1 x2 x3
1  1  1 A
2  1  2 B
3  1  3 C
4  1  4 D
5  1  5 E
6  2  6 F
7  2  7 G
8  2  8 H
9  2  9 I
10 2 10 J
11 3 11 A
12 3 12 B
13 3 13 C
14 3 14 D
15 3 15 E
16 4 16 F
17 4 17 G
18 4 18 H
19 4 19 I
20 4 20 J
> tail(df,20)
x1 x2 x3
31 7 31 A
32 7 32 B
33 7 33 C
34 7 34 D
35 7 35 E
36 8 36 F
37 8 37 G
38 8 38 H
39 8 39 I
40 8 40 J
41 9 41 A
42 9 42 B
43 9 43 C
44 9 44 D
45 9 45 E
46 10 46 F
47 10 47 G
48 10 48 H
49 10 49 I
50 10 50 J

Selecting first rows based on each level of factor variable x1 −

> df[!duplicated(df$x1),] x1 x2 x3 1 1 1 A 6 2 6 F 11 3 11 A 16 4 16 F 21 5 21 A 26 6 26 F 31 7 31 A 36 8 36 F 41 9 41 A 46 10 46 F Selecting first rows based on each level of factor variable x3 − > df[!duplicated(df$x3),]
x1 x2 x3
1 1 1 A
2 1 2 B
3 1 3 C
4 1 4 D
5 1 5 E
6 2 6 F
7 2 7 G
8 2 8 H
9 2 9 I
10 2 10 J
Published on 11-Aug-2020 12:37:02