How to create a sample from an R data frame if weights are assigned to the row values?


To create a random sample in R, we can use sample function but if the weight of the values is provided then we need to assign the probability of the values based on the weights. For example, if we have a data frame df that contains a column X with some values and another column Weight with the corresponding weights then a random sample of size 10 can be generated as follows −

df[sample(seq_len(nrow(df)),10,prob=df$Weight_x),]

Example

 Live Demo

Consider the below data frame −

set.seed(1256)
x<−rnorm(20,5,1)
weight_x<−sample(1:10,20,replace=TRUE)
df<−data.frame(x,weight_x)
df

Output

  x weight_x
1 4.126636 10
2 5.806501 1
3 5.768463 10
4 5.980315 8
5 6.593158 2
6 4.298533 10
7 6.196574 4
8 4.136517 5
9 4.504645 10
10 4.416107 6
11 5.257177 10
12 5.836453 1
13 5.334041 10
14 4.959786 2
15 3.406828 7
16 4.149746 2
17 4.657464 4
18 4.820102 10
19 5.401021 9
20 6.718216 6

Finding different samples using weight column −

Example

df[sample(seq_len(nrow(df)),5,prob=df$weight_x),]

Output

  x weight_x
11 5.257177 10
19 5.401021 9
13 5.334041 10
10 4.416107 6
5 6.593158 2

Example

df[sample(seq_len(nrow(df)),3,prob=df$weight_x),]

Output

  x weight_x
13 5.334041 10
3 5.768463 10
18 4.820102 10

Example

df[sample(seq_len(nrow(df)),7,prob=df$weight_x),]

Output

  x weight_x
9 4.504645 10
19 5.401021 9
12 5.836453 1
5 6.593158 2
15 3.406828 7
11 5.257177 10
6 4.298533 10

Example

df[sample(seq_len(nrow(df)),10,prob=df$weight_x),]

Output

  x weight_x
4 5.980315 8
9 4.504645 10
19 5.401021 9
1 4.126636 10
13 5.334041 10
12 5.836453 1
11 5.257177 10
18 4.820102 10
10 4.416107 6
3 5.768463 10

Example

df[sample(seq_len(nrow(df)),9,prob=df$weight_x),]

Output

  x weight_x
8 4.136517 5
11 5.257177 10
7 6.196574 4
4 5.980315 8
9 4.504645 10
6 4.298533 10
19 5.401021 9
18 4.820102 10
16 4.149746 2

Example

df[sample(seq_len(nrow(df)),4,prob=df$weight_x),]

Output

  x weight_x
1 4.126636 10
6 4.298533 10
11 5.257177 10
7 6.196574 4

Example

df[sample(seq_len(nrow(df)),15,prob=df$weight_x),]

Output

  x weight_x
3 5.768463 10
15 3.406828 7
19 5.401021 9
16 4.149746 2
9 4.504645 10
8 4.136517 5
11 5.257177 10
10 4.416107 6
18 4.820102 10
6 4.298533 10
4 5.980315 8
17 4.657464 4
1 4.126636 10
20 6.718216 6
13 5.334041 10

Example

df[sample(seq_len(nrow(df)),2,prob=df$weight_x),]

Output

  x weight_x
11 5.257177 10
13 5.334041 10

Example

df[sample(seq_len(nrow(df)),12,prob=df$weight_x),]

Output

  x weight_x
1 4.126636 10
3 5.768463 10
8 4.136517 5
11 5.257177 10
10 4.416107 6
6 4.298533 10
13 5.334041 10
4 5.980315 8
20 6.718216 6
12 5.836453 1
18 4.820102 10
19 5.401021 9

Example

df[sample(seq_len(nrow(df)),18,prob=df$weight_x),]

Output

 x weight_x
5 6.593158 2
4 5.980315 8
6 4.298533 10
20 6.718216 6
15 3.406828 7
3 5.768463 10
9 4.504645 10
10 4.416107 6
13 5.334041 10
19 5.401021 9
8 4.136517 5
11 5.257177 10
18 4.820102 10
1 4.126636 10
7 6.196574 4
12 5.836453 1
17 4.657464 4
16 4.149746 2

Updated on: 07-Nov-2020

812 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements