How to standardize data.table object column by group in R?


To standardize data.table object column by group, we can use scale function and provide the grouping column with by function.

For Example, if we have a data.table object called DT that contains two columns say G and Num where G is a grouping column and Num is a numerical column then we can standardize Num by column G by using the below given command −

DT[,"Num":=as.vector(scale(Num)),by=G]

Example 1

Consider the below data.table object −

library(data.table)
Grp<-sample(c("Male","Female"),20,replace=TRUE)
Response<-round(rnorm(20,5,1.25),2)
DT1<-data.table(Grp,Response)
DT1

The following dataframe is created

       Grp Response
 1: Female 5.31
 2: Male   5.20
 3: Female 6.38
 4: Male   4.53
 5: Female 4.90
 6: Female 4.78
 7: Male   3.73
 8: Female 6.19
 9: Male   4.33
10: Male   7.84
11: Male   6.70
12: Female 5.11
13: Male   6.80
14: Male   3.76
15: Male   3.56
16: Male   5.51
17: Female 6.58
18: Female 7.59
19: Male   4.62
20: Female 6.75

To standardize Response column by Grp column in DT1 on the above created data frame, add the following code to the above snippet −

library(data.table)
Grp<-sample(c("Male","Female"),20,replace=TRUE)
Response<-round(rnorm(20,5,1.25),2)
DT1<-data.table(Grp,Response)
DT1[,"Response":=as.vector(scale(Response)),by=Grp]
DT1

Output

If you execute all the above given snippets as a single program, it generates the following Output −

     Grp    Response
 1: Female -0.66313371
 2: Male    0.03955265
 3: Female  0.43789692
 4: Male   -0.43061348
 5: Female -1.08502396
 6: Female -1.20850403
 7: Male   -0.99200587
 8: Female  0.24238681
 9: Male   -0.57096158
10: Male    1.89214752
11: Male    1.09216337
12: Female -0.86893383
13: Male    1.16233742
14: Male   -0.97095365
15: Male   -1.11130175
16: Male    0.25709220
17: Female  0.64369704
18: Female  1.68298763
19: Male   -0.36745684
20: Female  0.81862714

Example 2

Following snippet creates a sample data frame −

Class<-sample(c("I","II","III"),20,replace=TRUE)
Rate<-round(rnorm(20,10,1.02),0)
DT2<-data.table(Class,Rate)
DT2

The following dataframe is created

  Class Rate
 1: II  10
 2: III  9
 3: II  10
 4: II  10
 5: III 10
 6: III  9
 7: III  8
 8: II  10
 9: II  11
10: III  9
11: I    9
12: II  11
13: III 13
14: II  10
15: III 12
16: I    8
17: II   9
18: I   10
19: III  9
20: II  10

To standardize Rate column by Class column in DT2 on the above created data frame, add the following code to the above snippet −

Class<-sample(c("I","II","III"),20,replace=TRUE)
Rate<-round(rnorm(20,10,1.02),0)
DT2<-data.table(Class,Rate)
DT2[,"Rate":=as.vector(scale(Rate)),by=Class]
DT2

Output

If you execute all the above given snippets as a single program, it generates the following Output −

   Class     Rate
 1: II  -0.18490007
 2: III -0.50669175
 3: II  -0.18490007
 4: II  -0.18490007
 5: III  0.07238454
 6: III -0.50669175
 7: III -1.08576803
 8: II  -0.18490007
 9: II   1.47920052
10: III -0.50669175
11: I    0.00000000
12: II   1.47920052
13: III  1.80961338
14: II  -0.18490007
15: III  1.23053710
16: I   -1.00000000
17: II  -1.84900065
18: I    1.00000000
19: III -0.50669175
20: II  -0.18490007

Updated on: 10-Nov-2021

328 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements