How to find the ID wise frequency in an R data frame?

R ProgrammingServer Side ProgrammingProgramming

To find the ID wise frequency in an R data frame, we can use summarise function of dplyr package after defining the ID with group_by function, also the column for which we want to find the frequency will be placed inside group_by function.

Check out the below examples to understand how it can be done.

Example 1

Following snippet creates a sample data frame −

ID<-sample(1:4,20,replace=TRUE)
Sales<-sample(1:10,20,replace=TRUE)
df1<-data.frame(ID,Sales)
df1

The following dataframe is created −

   ID Sales
1  4   4
2  1   3  
3  4   9
4  4   8
5  2   6
6  1   9
7  2   8
8  1   8
9  1   6
10 1   1
11 4   6
12 2   9
13 2  10
14 2   6
15 3   2
16 3   9
17 4   6
18 1   8
19 2  10
20 2   1

To load dplyr package and create an ID wise frequency column in df1, add the following code to the above snippet −

library(dplyr)
df1%>%group_by(ID,Sales)%>%summarise(Frequency=n())
`summarise()` regrouping output by 'ID' (override with `.groups` argument)
# A tibble: 16 x 3
# Groups: ID [4]

Output

If you execute all the above given snippets as a single program, it generates the following output −

   ID  Sales Frequency
  <int> <int> <int>
1  1    1      1
2  1    3      1
3  1    6      1
4  1    8      2
5  1    9      1
6  2    1      1
7  2    6      2
8  2    8      1
9  2    9      1
10 2   10      2
11 3    2      1
12 3    9      1
13 4    4      1
14 4    6      2
15 4    8      1
16 4    9      1

Example 2

Following snippet creates a sample data frame −

ID<-sample(0:2,20,replace=TRUE)
Group<-sample(c("I","II","III"),20,replace=TRUE)
df2<-data.frame(ID,Group)
df2

The following dataframe is created −

  ID Group
1  2 III
2  0   I
3  0   I
4  0   I
5  2 III
6  0  II
7  2  II
8  2  II
9  1 III
10 2   I
11 1  II
12 1   I
13 1   I
14 1   I
15 2  II
16 0  II
17 1   I
18 2   I
19 2   I
20 0 III

To create an ID wise frequency column in df2, add the following code to the above snippet −

df2%>%group_by(ID,Group)%>%summarise(Frequency=n())
`summarise()` regrouping output by 'ID' (override with `.groups` argument)
# A tibble: 9 x 3
# Groups: ID [3]

Output

If you execute all the above given snippets as a single program, it generates the following output −

  ID Group Frequency
 <int><chr> <int>
1  0    I    3
2  0   II    2
3  0  III    1
4  1    I    4
5  1   II    1
6  1  III    1
7  2    I    3
8  2   II    3
9  2  III    2
raja
Published on 11-Nov-2021 05:32:22

Advertisements