How to create a random sample of some percentage of rows for a particular value of a column from an R data frame?


Random sampling is an important part of data analysis, mostly we need to create a random sample based on rows instead of columns because rows represent the cases. To create a random sample of some percentage of rows for a particular value of a column from an R data frame we can use sample function with which function.

Consider the below data frame −

Example

 Live Demo

set.seed(887)
grp<-sample(LETTERS[1:4],20,replace=TRUE) Score<-sample(101:150,20)
df1<-data.frame(grp,Score)
df1

Output

grp Score
1 D 135
2 D 114
3 C 121
4 C 150
5 B 129
6 A 110
7 D 126
8 D 132
9 C 118
10 D 102
11 B 103
12 D 145
13 A 128
14 C 147
15 B 106
16 B 125
17 D 130
18 B 131
19 A 142
20 C 143

Randomly sampling fifty percent of rows based on A of column grp −

Example

df1[sample(which(df1$grp=='A'),round(0.5*length(which(df1$grp=='A')))),]

Output

grp Score 2 A 138 20 A 125

Let’s have a look at another example −

Example

 Live Demo

y1<-sample(c("YT1","YT2","YT3"),20,replace=TRUE)
y2<-rnorm(20,10,1) df2<-data.frame(y1,y2)
df2

Output

y1 y2
1 YT2 10.886273
2 YT1 9.534332
3 YT1 8.353436
4 YT1 10.878407
5 YT2 9.881384
6 YT2 9.825197
7 YT3 8.805524
8 YT3 10.189767
9 YT1 11.615293
10 YT1 10.194561
11 YT3 10.317023
12 YT1 11.570260
13 YT1 9.488106
14 YT2 10.340876
15 YT2 7.425779
16 YT2 10.085891
17 YT1 11.023932
18 YT2 10.301987
19 YT3 10.234140
20 YT1 9.048794

Randomly sampling thirty percent of rows based on YT1 of column y1 −

Example

df2[sample(which(df2$y1=='YT1'),round(0.3*length(which(df2$y1=='YT1')))),]

Output

    y1     y2
2  YT1 10.400617
13 YT1 8.977768

Updated on: 14-Oct-2020

676 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements