Often, we have duplicate values in a factor column that means a factor column has many levels and each of these levels occur many times. In this situation, if we have a frequency column then we want to find the total of that frequency based on the values of a factor column and this can be done by using aggregate function.
Consider the below data frame −
> set.seed(109) > Class<-rep(sample(LETTERS[1:5],4),times=5) > Frequency<-sample(1:10,20,replace=TRUE) > df1<-data.frame(Class,Frequency) > df1
Class Frequency 1 E 9 2 D 5 3 B 10 4 C 10 5 E 7 6 D 10 7 B 9 8 C 5 9 E 8 10 D 7 11 B 1 12 C 3 13 E 5 14 D 10 15 B 2 16 C 3 17 E 9 18 D 3 19 B 2 20 C 9
Finding the sum of frequency for each class −
> aggregate(df1["Frequency"],by=df1["Class"],sum)
Class Frequency 1 B 24 2 C 30 3 D 35 4 E 38
Lets’ have a look at another example −
> Metal<-rep(c("Iron","Nickel","Lead","Zinc","Tin","Sodium","Silver"),times=5) > Quantity<-sample(20:50,35,replace=TRUE) > df2<-data.frame(Metal,Quantity) > head(df2,10)
Metal Quantity 1 Iron 43 2 Nickel 33 3 Lead 25 4 Zinc 24 5 Tin 27 6 Sodium 34 7 Silver 31 8 Iron 37 9 Nickel 36 10 Lead 24 > tail(df2,10) Metal Quantity 26 Tin 49 27 Sodium 43 28 Silver 47 29 Iron 28 30 Nickel 41 31 Lead 21 32 Zinc 33 33 Tin 44 34 Sodium 34 35 Silver 33
> aggregate(df2["Quantity"],by=df2["Metal"],sum)
Metal Quantity 1 Iron 157 2 Lead 148 3 Nickel 174 4 Silver 165 5 Sodium 161 6 Tin 192 7 Zinc 155