- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to create a sample from an R data frame if weights are assigned to the row values?
To create a random sample in R, we can use sample function but if the weight of the values is provided then we need to assign the probability of the values based on the weights. For example, if we have a data frame df that contains a column X with some values and another column Weight with the corresponding weights then a random sample of size 10 can be generated as follows −
df[sample(seq_len(nrow(df)),10,prob=df$Weight_x),]
Example
Consider the below data frame −
set.seed(1256) x<−rnorm(20,5,1) weight_x<−sample(1:10,20,replace=TRUE) df<−data.frame(x,weight_x) df
Output
x weight_x 1 4.126636 10 2 5.806501 1 3 5.768463 10 4 5.980315 8 5 6.593158 2 6 4.298533 10 7 6.196574 4 8 4.136517 5 9 4.504645 10 10 4.416107 6 11 5.257177 10 12 5.836453 1 13 5.334041 10 14 4.959786 2 15 3.406828 7 16 4.149746 2 17 4.657464 4 18 4.820102 10 19 5.401021 9 20 6.718216 6
Finding different samples using weight column −
Example
df[sample(seq_len(nrow(df)),5,prob=df$weight_x),]
Output
x weight_x 11 5.257177 10 19 5.401021 9 13 5.334041 10 10 4.416107 6 5 6.593158 2
Example
df[sample(seq_len(nrow(df)),3,prob=df$weight_x),]
Output
x weight_x 13 5.334041 10 3 5.768463 10 18 4.820102 10
Example
df[sample(seq_len(nrow(df)),7,prob=df$weight_x),]
Output
x weight_x 9 4.504645 10 19 5.401021 9 12 5.836453 1 5 6.593158 2 15 3.406828 7 11 5.257177 10 6 4.298533 10
Example
df[sample(seq_len(nrow(df)),10,prob=df$weight_x),]
Output
x weight_x 4 5.980315 8 9 4.504645 10 19 5.401021 9 1 4.126636 10 13 5.334041 10 12 5.836453 1 11 5.257177 10 18 4.820102 10 10 4.416107 6 3 5.768463 10
Example
df[sample(seq_len(nrow(df)),9,prob=df$weight_x),]
Output
x weight_x 8 4.136517 5 11 5.257177 10 7 6.196574 4 4 5.980315 8 9 4.504645 10 6 4.298533 10 19 5.401021 9 18 4.820102 10 16 4.149746 2
Example
df[sample(seq_len(nrow(df)),4,prob=df$weight_x),]
Output
x weight_x 1 4.126636 10 6 4.298533 10 11 5.257177 10 7 6.196574 4
Example
df[sample(seq_len(nrow(df)),15,prob=df$weight_x),]
Output
x weight_x 3 5.768463 10 15 3.406828 7 19 5.401021 9 16 4.149746 2 9 4.504645 10 8 4.136517 5 11 5.257177 10 10 4.416107 6 18 4.820102 10 6 4.298533 10 4 5.980315 8 17 4.657464 4 1 4.126636 10 20 6.718216 6 13 5.334041 10
Example
df[sample(seq_len(nrow(df)),2,prob=df$weight_x),]
Output
x weight_x 11 5.257177 10 13 5.334041 10
Example
df[sample(seq_len(nrow(df)),12,prob=df$weight_x),]
Output
x weight_x 1 4.126636 10 3 5.768463 10 8 4.136517 5 11 5.257177 10 10 4.416107 6 6 4.298533 10 13 5.334041 10 4 5.980315 8 20 6.718216 6 12 5.836453 1 18 4.820102 10 19 5.401021 9
Example
df[sample(seq_len(nrow(df)),18,prob=df$weight_x),]
Output
x weight_x 5 6.593158 2 4 5.980315 8 6 4.298533 10 20 6.718216 6 15 3.406828 7 3 5.768463 10 9 4.504645 10 10 4.416107 6 13 5.334041 10 19 5.401021 9 8 4.136517 5 11 5.257177 10 18 4.820102 10 1 4.126636 10 7 6.196574 4 12 5.836453 1 17 4.657464 4 16 4.149746 2
- Related Articles
- How to fill the NA values from above row values in an R data frame?
- How to delete a row from an R data frame?
- How to subset nth row from an R data frame?
- How to randomly sample rows from an R data frame using sample_n?
- How to find the proportion of row values in an R data frame?
- How to find the difference between row values starting from bottom of an R data frame?
- How to create a row sum and a row product column in an R data frame?
- How to create a character vector from data frame values in R?
- How to change row values based on column values in an R data frame?
- How to divide the row values by row mean in R data frame?
- How to divide the data frame row values in R by row median?
- How to divide the row values by row sum in R data frame?
- How to divide data frame row values by row variance in R?
- How to create a row at the end an R data frame with column totals?
- How to delete a row from an R data frame if any value in the row is greater than n?

Advertisements