How to create random sample based on group columns of a data.table in R?

Random sampling helps us to reduce the biasedness in the analysis. If we have data in groups then we might want to find a random sample based on groups. For example, if we have a data frame with a group variable and each group contains ten values then we might want to create a random sample where we will have two values randomly selected from each group. This can be done by using sample function inside .SD

Example

Consider the below data.table −

library(data.table) Group<-rep(c("A","B","C","D","E"),times=4) Percentage<-sample(1:100,20) dt1<-data.table(Group,Percentage) dt1

Output

  Group Percentage
1:    A    97
2:    B    68
3:    C    19
4:    D    32
5:    E    98
6:    A    48
7:    B    94
8:    C    54
9:    D    7
10:   E    76
11:   A    10
12:   B    31
13:   C    59
14:   D    84
15:   E    41
16:   A    99
17:   B    1
18:   C    72
19:   D    42
20:   E    17

Creating a random sample of size 2 from each group −

Example

dt1[,.SD[sample(.N, min(2,.N))],by=Group]

Output

   Group Percentage
1:    A    48
2:    A    99
3:    B    94
4:    B    31
5:    C    54
6:    C    59
7:    D    42
8:    D    84
9:    E    98
10:   E    76

Let’s have a look at another example −

Example

Class<-rep(c("First","Second","Third","Fourth"),times=10)
Experience<-sample(1:5,40,replace=TRUE)
dt2<-data.table(Class,Experience)
head(dt2,10)

Output

   Class Experience
1: First    4
2: Second   2
3: Third    4
4: Fourth   2
5: First    4
6: Second   5
7: Third    3
8: Fourth   5
9: First    3
10: Second  5

Example

tail(dt2,10)

Output

   Class Experience
1: Third    4
2: Fourth   2
3: First    5
4: Second   2
5: Third    1
6: Fourth   4
7: First    5
8: Second   2
9: Third    4
10: Fourth  4

Example

dt2[,.SD[sample(.N, min(5,.N))],by=Class]

Output

  Class Experience
1: First    3
2: First    3
3: First    4
4: First    5
5: First    5
6: Second   5
7: Second   2
8: Second   5
9: Second   2
10: Second  1
11: Third   3
12: Third   1
13: Third   4
14: Third   3
15: Third   4
16: Fourth  2
17: Fourth  5
18: Fourth  2
19: Fourth  4
20: Fourth  2

Updated on: 08-Sep-2020

656 Views