How to standardize data.table object column by group in R?

R ProgrammingServer Side ProgrammingProgramming

To standardize data.table object column by group, we can use scale function and provide the grouping column with by function.

For Example, if we have a data.table object called DT that contains two columns say G and Num where G is a grouping column and Num is a numerical column then we can standardize Num by column G by using the below given command −

DT[,"Num":=as.vector(scale(Num)),by=G]

Example 1

Consider the below data.table object −

library(data.table)
Grp<-sample(c("Male","Female"),20,replace=TRUE)
Response<-round(rnorm(20,5,1.25),2)
DT1<-data.table(Grp,Response)
DT1

The following dataframe is created

       Grp Response
 1: Female 5.31
 2: Male   5.20
 3: Female 6.38
 4: Male   4.53
 5: Female 4.90
 6: Female 4.78
 7: Male   3.73
 8: Female 6.19
 9: Male   4.33
10: Male   7.84
11: Male   6.70
12: Female 5.11
13: Male   6.80
14: Male   3.76
15: Male   3.56
16: Male   5.51
17: Female 6.58
18: Female 7.59
19: Male   4.62
20: Female 6.75

To standardize Response column by Grp column in DT1 on the above created data frame, add the following code to the above snippet −

library(data.table)
Grp<-sample(c("Male","Female"),20,replace=TRUE)
Response<-round(rnorm(20,5,1.25),2)
DT1<-data.table(Grp,Response)
DT1[,"Response":=as.vector(scale(Response)),by=Grp]
DT1

Output

If you execute all the above given snippets as a single program, it generates the following Output −

     Grp    Response
 1: Female -0.66313371
 2: Male    0.03955265
 3: Female  0.43789692
 4: Male   -0.43061348
 5: Female -1.08502396
 6: Female -1.20850403
 7: Male   -0.99200587
 8: Female  0.24238681
 9: Male   -0.57096158
10: Male    1.89214752
11: Male    1.09216337
12: Female -0.86893383
13: Male    1.16233742
14: Male   -0.97095365
15: Male   -1.11130175
16: Male    0.25709220
17: Female  0.64369704
18: Female  1.68298763
19: Male   -0.36745684
20: Female  0.81862714

Example 2

Following snippet creates a sample data frame −

Class<-sample(c("I","II","III"),20,replace=TRUE)
Rate<-round(rnorm(20,10,1.02),0)
DT2<-data.table(Class,Rate)
DT2

The following dataframe is created

  Class Rate
 1: II  10
 2: III  9
 3: II  10
 4: II  10
 5: III 10
 6: III  9
 7: III  8
 8: II  10
 9: II  11
10: III  9
11: I    9
12: II  11
13: III 13
14: II  10
15: III 12
16: I    8
17: II   9
18: I   10
19: III  9
20: II  10

To standardize Rate column by Class column in DT2 on the above created data frame, add the following code to the above snippet −

Class<-sample(c("I","II","III"),20,replace=TRUE)
Rate<-round(rnorm(20,10,1.02),0)
DT2<-data.table(Class,Rate)
DT2[,"Rate":=as.vector(scale(Rate)),by=Class]
DT2

Output

If you execute all the above given snippets as a single program, it generates the following Output −

   Class     Rate
 1: II  -0.18490007
 2: III -0.50669175
 3: II  -0.18490007
 4: II  -0.18490007
 5: III  0.07238454
 6: III -0.50669175
 7: III -1.08576803
 8: II  -0.18490007
 9: II   1.47920052
10: III -0.50669175
11: I    0.00000000
12: II   1.47920052
13: III  1.80961338
14: II  -0.18490007
15: III  1.23053710
16: I   -1.00000000
17: II  -1.84900065
18: I    1.00000000
19: III -0.50669175
20: II  -0.18490007
raja
Updated on 10-Nov-2021 07:51:51

Advertisements