How to find group-wise summary statistics for an R data frame?

R Programming Server Side Programming Programming

To compare different groups, we need the summary statistics for each of the groups. It helps us to observe the differences between the groups. The summary statistics provides the minimum value, first quartile, median, third quartile, and the maximum values. Therefore, we can compare each of these values for the groups. To find the group-wise summary statistics for an R data frame, we can use tapply function.

Example

Consider the below data frame −

> set.seed(99)
> x1<-sample(1:100,50,replace=TRUE)
> x2<-rep(c("G1","G2","G3","G4","G5"),times=10)
> df<-data.frame(x1,x2)
> head(df,20)
x1 x2
1 48 G1
2 33 G2
3 44 G3
4 22 G4
5 99 G5
6 62 G1
7 98 G2
8 32 G3
9 13 G4
10 20 G5
11 100 G1
12 31 G2
13 68 G3
14 9 G4
15 82 G5
16 88 G1
17 30 G2
18 86 G3
19 84 G4
20 32 G5

Finding the summary statistics of x1 for each group −

> tapply(df$x1, df$x2, summary)
$G1
Min. 1st Qu. Median Mean 3rd Qu. Max.
14.0 55.0 72.0 67.8 86.5 100.0
$G2
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.0 31.5 60.5 52.4 69.5 98.0
$G3
Min. 1st Qu. Median Mean 3rd Qu. Max.
14.0 33.5 41.0 46.9 64.5 86.0
$G4
Min. 1st Qu. Median Mean 3rd Qu. Max.
9.00 23.75 53.00 53.30 82.75 97.00
$G5
Min. 1st Qu. Median Mean 3rd Qu. Max.
7.00 31.25 32.00 42.40 44.75 99.00

Let’s have a look at one more example −

> y1<-rep(c(letters[1:5]),times=5)
> y2<-rep(c(14,25,13,12,41,52,44,28,17,30),times=c(2,5,3,3,1,5,1,2,2,1))
> df_y<-data.frame(y1,y2)
> head(df_y,20)
  y1 y2
 1 a 14
 2 b 14
 3 c 25
 4 d 25
 5 e 25
 6 a 25
 7 b 25
 8 c 13
 9 d 13
10 e 13
11 a 12
12 b 12
13 c 12
14 d 41
15 e 52
16 a 52
17 b 52
18 c 52
19 d 52
20 e 44
> tapply(df_y$y2, df_y$y1, summary)
$a
Min. 1st Qu. Median Mean 3rd Qu. Max.
12.0 14.0 25.0 26.2 28.0 52.0
$b
Min. 1st Qu. Median Mean 3rd Qu. Max.
12.0 14.0 25.0 26.2 28.0 52.0
$c
Min. 1st Qu. Median Mean 3rd Qu. Max.
12.0 13.0 17.0 23.8 25.0 52.0
$d
Min. 1st Qu. Median Mean 3rd Qu. Max.
13.0 17.0 25.0 29.6 41.0 52.0
$e
Min. 1st Qu. Median Mean 3rd Qu. Max.
13.0 25.0 30.0 32.8 44.0 52.0

Nizamuddin Siddiqui

Updated on: 2020-08-11T13:37:38+05:30

513 Views

Kickstart Your Career

Get certified by completing the course

Get Started