To find the sum of numerical columns based on the combination of values in categorical columns in R data frame, we can follow the below steps −
Let's create a data frame as shown below −
> x1<-rpois(20,2) > x2<-rpois(20,5) > x3<-rpois(20,1) > x4<-rpois(20,10) > f1<-sample(LETTERS[1:3],20,replace=TRUE) > f2<-sample(letters[1:3],20,replace=TRUE) > df<-data.frame(x1,x2,x3,x4,f1,f2) > df
On executing, the above script generates the below output(this output will vary on your system due to randomization) −
x1 x2 x3 x4 f1 f2 1 1 6 0 11 B c 2 2 3 3 16 B a 3 4 2 0 13 B b 4 1 3 0 4 B c 5 0 4 0 10 A b 6 1 8 3 8 C c 7 2 4 0 2 A c 8 0 1 1 12 A b 9 3 4 0 15 B b 10 0 1 1 4 A a 11 2 8 0 6 C b 12 1 4 1 13 C c 13 1 4 1 13 A b 14 2 6 2 11 A b 15 3 5 0 10 A a 16 1 4 0 17 A c 17 2 4 1 4 B a 18 1 4 0 11 B b 19 3 3 1 8 B c 20 4 6 3 5 A a
Using recast function to find the sum of columns x1, x2, x3, and x4 based on f1 and f2 in df −
> x1<-rpois(20,2) > x2<-rpois(20,5) > x3<-rpois(20,1) > x4<-rpois(20,10) > f1<-sample(LETTERS[1:3],20,replace=TRUE) > f2<-sample(letters[1:3],20,replace=TRUE) > df<-data.frame(x1,x2,x3,x4,f1,f2) > library(reshape2) > recast(df,variable~f1+f2,sum)
Using f1, f2 as id variables variable A_a A_b A_c B_a B_b B_c C_b C_c 1 x1 7 3 3 4 8 5 2 2 2 x2 12 15 8 7 10 12 8 12 3 x3 4 4 0 4 0 1 0 4 4 x4 19 46 19 20 39 23 6 21