How to find the group-wise median in an R data.table object?


When the assumptions of parametric analysis are not satisfied then we move on to non-parametric analysis and non-parametric analysis often deals with the calculation of median because the data is not normally distributed. If we want to find the group-wise median and the data is stored in a data.table object then lapply function can be used as shown in the below examples.

Example

Loading data.table package:

> library(data.table)

Consider the below data.table object:

Example

> Group<-sample(LETTERS[1:4],20,replace=TRUE)
> x1<-rnorm(20,1,0.87)
> x2<-rnorm(20,5,1.2)
> x3<-rnorm(20,500,20)
> x4<-rnorm(20,50,1.14)
> dt1<-data.table(Group,x1,x2,x3,x4)
> dt1

Output

Group x1 x2 x3 x4
1: B 0.515370827 6.174187 542.9350 50.28300
2: B 0.522858146 6.976872 510.5568 49.71331
3: A 1.055456751 3.192242 476.7693 48.88280
4: A -0.024912175 2.847402 506.5335 50.67151
5: C -0.196164614 3.328402 508.6321 48.39842
6: C 1.290014270 5.556677 524.5811 48.27884
7: D 1.486977865 5.897758 486.5484 49.51944
8: D -0.007248341 6.468281 532.3197 51.45941
9: D 2.182819501 5.394480 442.8788 49.58497
10: B 2.211356101 6.443493 488.6105 49.02810
11: D -0.419805499 3.586357 485.3483 49.87930
12: B 1.865157121 6.099377 533.5723 51.51517
13: D 2.389899358 4.531113 507.7677 49.68121
14: C 0.411933014 4.602449 492.0163 50.05786
15: B 1.439917480 4.031037 475.5113 49.90952
16: A 1.749343791 5.170324 513.3880 50.25203
17: D 1.648629013 5.439521 519.4953 50.00103
18: A 1.825107893 2.489396 482.8070 49.83169
19: B 0.757930091 4.975242 501.2664 49.70943
20: D 1.989164222 3.915599 491.8682 50.91287

Finding the group-wise median for all columns in dt1:

Example

> dt1[,lapply(.SD,median),by=Group]

Output

Group x1 x2 x3 x4
1: B 1.098924 6.136782 505.9116 49.81141
2: A 1.402400 3.019822 494.6703 50.04186
3: C 0.411933 4.602449 508.6321 48.39842
4: D 1.648629 5.394480 491.8682 49.87930

Let’s have a look at another example:

Example

> Class<-sample(c("First","Second","Third"),20,replace=TRUE)
> Payment<-sample(1:10,20,replace=TRUE)
> dt2<-data.table(Class,Payment)
> dt2

Output

Class Payment
1: First 5
2: First 4
3: First 3
4: Second 5
5: First 1
6: Third 8
7: First 3
8: Second 7
9: Second 6
10: Second 10
11: First 4
12: Second 2
13: Second 2
14: First 10
15: First 1
16: Third 3
17: Third 1
18: Second 5
19: Third 4
20: Second 10

Finding the group-wise median for all columns in dt2:

Example

> dt2[,lapply(.SD,median),by=Class]

Output

Class Payment
1: First 3.5
2: Second 5.5
3: Third 3.5

Updated on: 21-Nov-2020

558 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements