How to create bins for a continuous vector in R?

R ProgrammingServer Side ProgrammingProgramming

To create the bins for a continuous vector, we can use cut function and store the bins in a data frame along with the original vector. The values in the cut function must be passed based on the range of the vector values, otherwise, there will be NA’s in the bin values. For example, if we have a vector that contains 0.55 and we do not use 0 in the cut function then the first bin will be NA. Check out the below examples to understand how to properly do it.

Example1

 Live Demo

x1<−rnorm(20,5,2)
x1

Output

[1] 3.066708 6.729915 7.706962 7.564306 3.924779 5.400262 2.529380 4.377311
[9] 7.270613 6.135201 5.068060 6.447229 8.603205 4.065874 4.132155 3.060366
[17] 0.953596 2.861802 7.250666 6.928397

Example

df1<−data.frame(x1,bin=cut(x1,c(0,1,2,3,4,5,6,7,8,9),include.lowest=TRUE))
df1

Output

     x1     bin
1 3.066708 (3,4]
2 6.729915 (6,7]
3 7.706962 (7,8]
4 7.564306 (7,8]
5 3.924779 (3,4]
6 5.400262 (5,6]
7 2.529380 (2,3]
8 4.377311 (4,5]
9 7.270613 (7,8]
10 6.135201 (6,7]
11 5.068060 (5,6]
12 6.447229 (6,7]
13 8.603205 (8,9]
14 4.065874 (4,5]
15 4.132155 (4,5]
16 3.060366 (3,4]
17 0.953596 [0,1]
18 2.861802 (2,3]
19 7.250666 (7,8]
20 6.928397 (6,7]

Example2

 Live Demo

x2<−runif(20,2,5)
x2

Output

[1] 2.656399 2.436808 3.704048 3.572767 2.321280 2.982751 4.911949 2.483126
[9] 2.177203 2.797627 4.621546 3.645550 2.888457 2.919597 4.354709 4.251886
[17] 4.862071 3.367629 2.610280 3.063467

Example

df2<−data.frame(x1,bin=cut(x2,c(2,3,4,5),include.lowest=TRUE))
df2

Output

     x1     bin
1 3.066708 [2,3]
2 6.729915 [2,3]
3 7.706962 (3,4]
4 7.564306 (3,4]
5 3.924779 [2,3]
6 5.400262 [2,3]
7 2.529380 (4,5]
8 4.377311 [2,3]
9 7.270613 [2,3]
10 6.135201 [2,3]
11 5.068060 (4,5]
12 6.447229 (3,4]
13 8.603205 [2,3]
14 4.065874 [2,3]
15 4.132155 (4,5]
16 3.060366 (4,5]
17 0.953596 (4,5]
18 2.861802 (3,4]
19 7.250666 [2,3]
20 6.928397 (3,4]
raja
Published on 05-Feb-2021 10:09:38
Advertisements