# How to select the first and last row based on group column in an R data frame?

Extraction of data is necessary in data analysis because extraction helps us to keep the important information about a data set. This important information could be the first row and the last row of groups as well, also we might want to use these rows for other type of analysis such as comparing the initial and last data values among groups. We can extract or select the first and last row based on group column by using slice function of dplyr package.

## Example

Live Demo

Consider the below data frame:
> x1<-rep(1:4,each=10)
> x2<-rpois(40,5)
> df1<-data.frame(x1,x2)
> head(df1,12)

## Output

  x1 x2
1  1  3
2  1  4
3  1  6
4  1  6
5  1  3
6  1  4
7  1  7
8  1  8
9  1  7
10 1  2
11 2  8
12 2  7

## Example

> tail(df1,12)

## Output

x1 x2
29 3 4
30 3 5
31 4 4
32 4 6
33 4 7
34 4 5
35 4 5
36 4 4
37 4 9
38 4 4
39 4 3
40 4 6

> library(dplyr)
Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’ −

filter, lag

The following objects are masked from ‘package:base’ −

intersect, setdiff, setequal, union

Selecting first and last row based on group column x1 −

## Example

> df1%>%group_by(x1)%>%slice(c(1,n()))
# A tibble: 8 x 2
# Groups: x1 [4]

## Output

   x1 x2
<int> <int>
1 1 3
2 1 2
3 2 8
4 2 4
5 3 5
6 3 5
7 4 4
8 4 6

Let’s have a look at another example −

## Example

Live Demo

> y1<-rep(c("A","B","C"),each=10)
> y2<-rnorm(30)
> df2<-data.frame(y1,y2)
> head(df2,12)

## Output

   y1 y2
1 A -1.1640927
2 A 0.3146504
3 A -1.5213974
4 A -1.3728970
5 A -0.9964678
6 A -0.5022738
7 A -0.4225463
8 A -0.3501037
9 A 0.3043838
10 A -1.5216102
11 B -0.2425732
12 B 0.5554217

## Example

> tail(df2,12)

## Output

   y1 y2
19 B 0.30172320
20 B 1.68341427
21 C 0.55127997
22 C -1.77840803
23 C 0.03001296
24 C -1.19246335
25 C 0.03612258
26 C -0.35468216
27 C -0.63579743
28 C -1.90074403
29 C 0.50072577
30 C 0.31911138

## Example

> df2%>%group_by(y1)%>%slice(c(1,n()))
# A tibble: 6 x 2
# Groups: y1 [3]

## Output

   y1 y2
<fct> <dbl>
1 A -1.16
2 A -1.52
3 B -0.243
4 B 1.68
5 C 0.551
6 C 0.319