How to find the proportion of categories based on another categorical column in R's data.table object?


To find the proportion of categories based on another categorical column in R's data.table object, we can follow the below steps −

  • First of all, create a data.table object.
  • Finding the proportion based on categorical column.

Create a data.table object

Loading data.table package and creating a data.table object with two categorical columns −

library(data.table)
Category1<-sample(LETTERS[1:3],30,replace=TRUE)
Category2<-sample(letters[1:4],30,replace=TRUE)
DT<-data.table(Category1,Category2)
DT

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

   Category1 Category2
1:    C       d
2:    B       a
3:    C       d
4:    C       a
5:    B       c
6:    C       b
7:    B       a
8:    B       b
9:    B       c
10:    B       d
11:    B       a
12:    A       d
13:    C       d
14:    B       a
15:    A       a
16:    C       a
17:    B       d
18:    C       c
19:    C       c
20:    A       b
21:    A       d
22:    C       c
23:    B       d
24:    C       c
25:    A       d
26:    A       b
27:    B       a
28:    A       c
29:    B       d
30:    A       d
   Category1 Category2

Finding the proportion based on categorical column

Use tabulate function with factor function to find the proportion of Category2 values in Category1 values −

library(data.table)
Category1<-sample(LETTERS[1:3],30,replace=TRUE)
Category2<-sample(letters[1:4],30,replace=TRUE)
DT<-data.table(Category1,Category2)
DT[order(Category2),.(Category2=letters[1:4],Proportion=tabulate(factor(Category2))/.N
),by=Category1]
   Category1 Category2    Proportion
1:    B          a       0.41666667
2:    B          b       0.08333333
3:    B          c       0.16666667
4:    B          d       0.33333333
5:    C          a       0.20000000
6:    C          b       0.10000000
7:    C          c       0.40000000
8:    C          d       0.30000000
9:    A          a       0.12500000
10:   A          b       0.25000000
11:   A          c       0.12500000
12:   A          d       0.50000000

Updated on: 13-Aug-2021

114 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements