- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to find group-wise summary statistics for an R data frame?
To compare different groups, we need the summary statistics for each of the groups. It helps us to observe the differences between the groups. The summary statistics provides the minimum value, first quartile, median, third quartile, and the maximum values. Therefore, we can compare each of these values for the groups. To find the group-wise summary statistics for an R data frame, we can use tapply function.
Example
Consider the below data frame −
> set.seed(99) > x1<-sample(1:100,50,replace=TRUE) > x2<-rep(c("G1","G2","G3","G4","G5"),times=10) > df<-data.frame(x1,x2) > head(df,20) x1 x2 1 48 G1 2 33 G2 3 44 G3 4 22 G4 5 99 G5 6 62 G1 7 98 G2 8 32 G3 9 13 G4 10 20 G5 11 100 G1 12 31 G2 13 68 G3 14 9 G4 15 82 G5 16 88 G1 17 30 G2 18 86 G3 19 84 G4 20 32 G5
Finding the summary statistics of x1 for each group −
> tapply(df$x1, df$x2, summary) $G1 Min. 1st Qu. Median Mean 3rd Qu. Max. 14.0 55.0 72.0 67.8 86.5 100.0 $G2 Min. 1st Qu. Median Mean 3rd Qu. Max. 4.0 31.5 60.5 52.4 69.5 98.0 $G3 Min. 1st Qu. Median Mean 3rd Qu. Max. 14.0 33.5 41.0 46.9 64.5 86.0 $G4 Min. 1st Qu. Median Mean 3rd Qu. Max. 9.00 23.75 53.00 53.30 82.75 97.00 $G5 Min. 1st Qu. Median Mean 3rd Qu. Max. 7.00 31.25 32.00 42.40 44.75 99.00
Let’s have a look at one more example −
> y1<-rep(c(letters[1:5]),times=5) > y2<-rep(c(14,25,13,12,41,52,44,28,17,30),times=c(2,5,3,3,1,5,1,2,2,1)) > df_y<-data.frame(y1,y2) > head(df_y,20) y1 y2 1 a 14 2 b 14 3 c 25 4 d 25 5 e 25 6 a 25 7 b 25 8 c 13 9 d 13 10 e 13 11 a 12 12 b 12 13 c 12 14 d 41 15 e 52 16 a 52 17 b 52 18 c 52 19 d 52 20 e 44 > tapply(df_y$y2, df_y$y1, summary) $a Min. 1st Qu. Median Mean 3rd Qu. Max. 12.0 14.0 25.0 26.2 28.0 52.0 $b Min. 1st Qu. Median Mean 3rd Qu. Max. 12.0 14.0 25.0 26.2 28.0 52.0 $c Min. 1st Qu. Median Mean 3rd Qu. Max. 12.0 13.0 17.0 23.8 25.0 52.0 $d Min. 1st Qu. Median Mean 3rd Qu. Max. 13.0 17.0 25.0 29.6 41.0 52.0 $e Min. 1st Qu. Median Mean 3rd Qu. Max. 13.0 25.0 30.0 32.8 44.0 52.0
- Related Articles
- How to find the statistical summary of an R data frame with all the descriptive statistics?
- How to save the summary statistics into a data frame in R?
- How to get the summary statistics including all basic statistical values for R data frame columns?
- How to perform group-wise linear regression for a data frame in R?
- Find the group wise large and small values in an R data frame.
- How to find the ID wise frequency in an R data frame?
- How to find the row-wise frequency of zeros in an R data frame?
- How to find the row wise mode of strings in an R data frame?
- How to find row minimum for an R data frame?
- How to find mode for an R data frame column?
- How to display the data frame summary in vertical order in R?
- How to create a group column in an R data frame?
- How to create a data frame of the maximum value for each group in an R data frame using dplyr?
- How to add group-level summary statistics as a new column in Pandas?
- How to create group names for consecutively duplicate values in an R data frame column?

Advertisements