How to create a random sample of some percentage of rows for a particular value of a column from an R data frame?

R Programming Server Side Programming Programming

Random sampling is an important part of data analysis, mostly we need to create a random sample based on rows instead of columns because rows represent the cases. To create a random sample of some percentage of rows for a particular value of a column from an R data frame we can use sample function with which function.

Consider the below data frame −

Example

Live Demo

set.seed(887)
grp<-sample(LETTERS[1:4],20,replace=TRUE) Score<-sample(101:150,20)
df1<-data.frame(grp,Score)
df1

Output

Randomly sampling fifty percent of rows based on A of column grp −

Example

df1[sample(which(df1$grp=='A'),round(0.5*length(which(df1$grp=='A')))),]

Output

grp Score 2 A 138 20 A 125

Let’s have a look at another example −

Example

Live Demo

y1<-sample(c("YT1","YT2","YT3"),20,replace=TRUE)
y2<-rnorm(20,10,1) df2<-data.frame(y1,y2)
df2

Output

y1 y2
1 YT2 10.886273
2 YT1 9.534332
3 YT1 8.353436
4 YT1 10.878407
5 YT2 9.881384
6 YT2 9.825197
7 YT3 8.805524
8 YT3 10.189767
9 YT1 11.615293
10 YT1 10.194561
11 YT3 10.317023
12 YT1 11.570260
13 YT1 9.488106
14 YT2 10.340876
15 YT2 7.425779
16 YT2 10.085891
17 YT1 11.023932
18 YT2 10.301987
19 YT3 10.234140
20 YT1 9.048794

Randomly sampling thirty percent of rows based on YT1 of column y1 −

Example

df2[sample(which(df2$y1=='YT1'),round(0.3*length(which(df2$y1=='YT1')))),]

Output

    y1     y2
2  YT1 10.400617
13 YT1 8.977768

Nizamuddin Siddiqui

Updated on: 2020-10-14T09:10:04+05:30

974 Views

Kickstart Your Career

Get certified by completing the course

Get Started