How to randomly sample rows from an R data frame using sample_n?


To randomly sample rows from an R data frame using sample_n, we can directly pass the sample size inside sample_n function of dplyr package. For example, if we have data frame called df then to create a random sample of 5 rows in df can be done by using the command −

df%>%sample_n(5)

Example1

Consider the below data frame −

 Live Demo

x1<-rnorm(20,21,3.24)
x2<-rnorm(20,5,2.1)
df1<-data.frame(x1,x2)
df1

Output

     x1         x2
1  21.17214  4.256648
2  24.41776  4.835844
3  22.57051  5.943711
4  18.80514  3.365967
5  21.49105  1.583063
6  20.04658  8.334448
7  16.03291  5.094798
8  21.71183  3.497276
9  18.27658  6.657804
10 15.75965  3.092517
11 24.15690  3.618473
12 25.91623  4.283104
13 24.65660  3.759834
14 18.80581  6.551064
15 25.15142  2.812932
16 20.48691  6.844323
17 25.56193  5.479819
18 22.22655  4.492731
19 20.56847  7.983841
20 23.24853  5.266284

Loading dplyr package and randomly sampling 10 rows in df1 −

library(dplyr)
df1%>%sample_n(10)
     x1         x2
1  18.80581  6.551064
2  25.91623  4.283104
3  16.03291  5.094798
4  15.75965  3.092517
5  18.27658  6.657804
6  18.80514  3.365967
7  22.57051  5.943711
8  20.04658  8.334448
9  24.41776  4.835844
10 24.15690  3.618473

Example2

 Live Demo

y1<-sample(0:1,20,replace=TRUE)
y2<-sample(LETTERS[1:5],20,replace=TRUE)
df2<-data.frame(y1,y2)
df2

Output

   y1 y2
1  0  E
2  0  B
3  1  E
4  0  E
5  0  D
6  1  A
7  0  D
8  1  A
9  0  E
10 0  C
11 1  E
12 1  C
13 0  A
14 0  A
15 1  D
16 0  C
17 1  D
18 0  E
19 0  D
20 0  E

Randomly sampling 10 rows in df2 −

df2%>%sample_n(10)


   y1 y2
1  0  E
2  0  C
3  1  E
4  1  A
5  0  A
6  1  E
7  0  D
8  0  D
9  0  D
10 0  B

Updated on: 06-Mar-2021

376 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements