How to find the number of unique values of multiple categorical columns based on one categorical column in R?



To find the number of unique values of multiple categorical columns based on one categorical column, we can follow the below steps −

  • First of all, create a data frame
  • Use summarise_each function with n_distinct function to find the number of unique values based on a categorical column.

Create the data frame

Let's create a data frame as shown below −

 Live Demo

x<-
sample(c("First","Second","Third","Fourth","Fifth","Sixth","Seventh","Eighth","Nineth",
"Tenth"),25,replace=TRUE)
C1<-sample(LETTERS[1:4],25,replace=TRUE)
C2<-sample(letters[1:4],25,replace=TRUE)
df<-data.frame(x,C1,C2)
df

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

      x  C1 C2
1 Seventh B a
2 Third   C c
3 Nineth  A a
4 Third   D c
5 Seventh D d
6 Fourth  A c
7 Seventh B a
8 Third   D a
9 Seventh D c
10 First  A a
11 Eighth D d
12 Tenth  C b
13 Fifth  A c
14 Second A c
15 Fourth B d
16 Nineth C b
17 Fifth D a
18 First A a
19 Tenth B a
20 Nineth A b
21 Third B b
22 Tenth A a
23 Fifth A a
24 Sixth D b
25 First A c

Find number of unique values based on categorical column

Use n_distinct function and summarise_each function of dplyr package to find the number of unique values in C1 and C2 based on x −

x<-
sample(c("First","Second","Third","Fourth","Fifth","Sixth","Seventh","Eighth","Nineth",
"Tenth"),25,replace=TRUE)
C1<-sample(LETTERS[1:4],25,replace=TRUE)
C2<-sample(letters[1:4],25,replace=TRUE)
df<-data.frame(x,C1,C2)
library(dplyr)
df %>% group_by(x) %>% summarise_each(funs(n_distinct(.)))

Output

# A tibble: 10 x 3
    x     C1 C2
   <chr> <int> <int>
1  Eighth  1    1
2  Fifth   2    2
3  First   1    2
4  Fourth  2    2
5  Nineth  2    2
6  Second  1    1
7  Seventh 2    3
8  Sixth   1    1
9  Tenth   3    2
10 Third   3    3

Advertisements