- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to create a random sample of some percentage of rows for a particular value of a column from an R data frame?
Random sampling is an important part of data analysis, mostly we need to create a random sample based on rows instead of columns because rows represent the cases. To create a random sample of some percentage of rows for a particular value of a column from an R data frame we can use sample function with which function.
Consider the below data frame −
Example
set.seed(887) grp<-sample(LETTERS[1:4],20,replace=TRUE) Score<-sample(101:150,20) df1<-data.frame(grp,Score) df1
Output
grp Score 1 D 135 2 D 114 3 C 121 4 C 150 5 B 129 6 A 110 7 D 126 8 D 132 9 C 118 10 D 102 11 B 103 12 D 145 13 A 128 14 C 147 15 B 106 16 B 125 17 D 130 18 B 131 19 A 142 20 C 143
Randomly sampling fifty percent of rows based on A of column grp −
Example
df1[sample(which(df1$grp=='A'),round(0.5*length(which(df1$grp=='A')))),]
Output
grp Score 2 A 138 20 A 125
Let’s have a look at another example −
Example
y1<-sample(c("YT1","YT2","YT3"),20,replace=TRUE) y2<-rnorm(20,10,1) df2<-data.frame(y1,y2) df2
Output
y1 y2 1 YT2 10.886273 2 YT1 9.534332 3 YT1 8.353436 4 YT1 10.878407 5 YT2 9.881384 6 YT2 9.825197 7 YT3 8.805524 8 YT3 10.189767 9 YT1 11.615293 10 YT1 10.194561 11 YT3 10.317023 12 YT1 11.570260 13 YT1 9.488106 14 YT2 10.340876 15 YT2 7.425779 16 YT2 10.085891 17 YT1 11.023932 18 YT2 10.301987 19 YT3 10.234140 20 YT1 9.048794
Randomly sampling thirty percent of rows based on YT1 of column y1 −
Example
df2[sample(which(df2$y1=='YT1'),round(0.3*length(which(df2$y1=='YT1')))),]
Output
y1 y2 2 YT1 10.400617 13 YT1 8.977768
Advertisements