How to split a data frame in R into multiple parts randomly?


When a data frame is large, we can split it into multiple parts randomly. This might be required when we want to analyze the data partially. We can do this with the help of split function and sample function to select the values randomly.

Example

Consider the trees data in base R −

> str(trees)
'data.frame': 31 obs. of 3 variables:
$ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
$ Height: num 70 65 63 72 81 83 66 75 80 75 ...
$ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...

Splitting the trees data in three parts −

> split(trees, sample(rep(1:3,times=c(10,10,11))))
$`1`
 Girth Height Volume
 2  8.6 65 10.3
 3  8.8 63 10.2
10 11.2 75 19.9
12 11.4 76 21.0
13 11.4 76 21.4
16 12.9 74 22.2
21 14.0 78 34.5
22 14.2 80 31.7
25 16.3 77 42.6
26 17.3 81 55.4
$`2`
Girth Height Volume
 5 10.7 81 18.8
 6 10.8 83 19.7
 8 11.0 75 18.2
11 11.3 79 24.2
14 11.7 69 21.3
17 12.9 85 33.8
20 13.8 64 24.9
28 17.9 80 58.3
29 18.0 80 51.5
30 18.0 80 51.0
$`3`
Girth Height Volume
1 8.3 70 10.3
4 10.5 72 16.4
7 11.0 66 15.6
9 11.1 80 22.6
15 12.0 75 19.1
18 13.3 86 27.4
19 13.7 71 25.7
23 14.5 74 36.3
24 16.0 72 38.3
27 17.5 82 55.7
31 20.6 87 77.0

Consider the women data in base R −

> str(women)
'data.frame': 15 obs. of 2 variables:
$ height: num 58 59 60 61 62 63 64 65 66 67 ...
$ weight: num 115 117 120 123 126 129 132 135 139 142 ...

Splitting the women data in two parts −

> split(women, sample(rep(1:2,times=c(10,5))))
$`1`
height weight
 2 59 117
 4 61 123
 5 62 126
 6 63 129
 7 64 132
 9 66 139
11 68 146
12 69 150
14 71 159
15 72 164
$`2`
height weight
1 58 115
3 60 120
8 65 135
10 67 142
13 70 154

Updated on: 11-Aug-2020

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements