How to count the number of rows for a combination of categorical variables in R?


When we have two categorical variables then each of them is likely to have different number of rows for the other variable. This helps us to understand the combinatorial values of those two categorical variables. We can find such type of rows using count function of dplyr package.

Example

Consider the CO2 data in base R −

> head(CO2,20)
> head(CO2,20)
      Plant    Type    Treatment    conc     uptake
1     Qn1     Quebec   nonchilled     95      16.0
2     Qn1     Quebec   nonchilled    175      30.4
3     Qn1     Quebec   nonchilled    250      34.8
4     Qn1     Quebec   nonchilled    350      37.2
5     Qn1     Quebec   nonchilled    500      35.3
6     Qn1     Quebec   nonchilled    675      39.2
7     Qn1     Quebec   nonchilled   1000      39.7
8     Qn2     Quebec   nonchilled     95      13.6
9     Qn2     Quebec   nonchilled    175      27.3
10    Qn2     Quebec   nonchilled    250      37.1
11    Qn2     Quebec   nonchilled    350      41.8
12    Qn2     Quebec   nonchilled    500      40.6
13    Qn2     Quebec   nonchilled    675      41.4
14    Qn2     Quebec   nonchilled   1000      44.3
15    Qn3     Quebec   nonchilled     95      16.2
16    Qn3     Quebec   nonchilled    175      32.4
17    Qn3     Quebec   nonchilled    250      40.3
18    Qn3     Quebec   nonchilled    350      42.1
19    Qn3     Quebec   nonchilled    500      42.9
20    Qn3     Quebec   nonchilled    675      43.9
> tail(CO2,20)  
      Plant    Type        Treatment    conc    uptake
65    Mc1    Mississippi   chilled      175     14.9
66    Mc1    Mississippi   chilled      250     18.1
67    Mc1    Mississippi   chilled      350     18.9
68    Mc1    Mississippi   chilled      500     19.5
69    Mc1    Mississippi   chilled      675     22.2
70    Mc1    Mississippi   chilled     1000     21.9
71    Mc2    Mississippi   chilled       95      7.7
72    Mc2    Mississippi   chilled      175     11.4
73    Mc2    Mississippi   chilled      250     12.3
74    Mc2    Mississippi   chilled      350     13.0
75    Mc2    Mississippi   chilled      500     12.5
76    Mc2    Mississippi   chilled      675     13.7
77    Mc2    Mississippi   chilled    1000      14.4
78    Mc3    Mississippi   chilled      95      10.6
79    Mc3    Mississippi   chilled     175      18.0
80    Mc3    Mississippi   chilled     250      17.9
81    Mc3    Mississippi   chilled     350      17.9
82    Mc3    Mississippi   chilled     500      17.9
83    Mc3    Mississippi   chilled     675      18.9
84    Mc3   Mississippi    chilled    1000      19.9
> library(dplyr)

Finding the number of rows per Treatment for Type variable −

> count(CO2, Type, Treatment)
# A tibble: 4 x 3
Type Treatment n
<fct> <fct> <int>
1 Quebec nonchilled 21
2 Quebec chilled 21
3 Mississippi nonchilled 21
4 Mississippi chilled 21

Finding the number of rows per Plant for Type variable −

> count(CO2, Type, Plant)
# A tibble: 12 x 3
Type Plant n
  <fct> <ord> <int>
1 Quebec Qn1 7
2 Quebec Qn2 7
3 Quebec Qn3 7
4 Quebec Qc1 7
5 Quebec Qc3 7
6 Quebec Qc2 7
7 Mississippi Mn3 7
8 Mississippi Mn2 7
9 Mississippi Mn1 7
10 Mississippi Mc2 7
11 Mississippi Mc3 7
12 Mississippi Mc1 7

Finding the number of rows per Treatment for Plant variable −

> count(CO2, Plant, Treatment)
# A tibble: 12 x 3
Plant Treatment n
<ord> <fct> <int>
1 Qn1 nonchilled 7
2 Qn2 nonchilled 7
3 Qn3 nonchilled 7
4 Qc1 chilled 7
5 Qc3 chilled 7
6 Qc2 chilled 7
7 Mn3 nonchilled 7
8 Mn2 nonchilled 7
9 Mn1 nonchilled 7
10 Mc2 chilled 7
11 Mc3 chilled 7
12 Mc1 chilled 7

Updated on: 11-Aug-2020

354 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements