How to perform group-wise linear regression for a data frame in R?


The group−wise linear regression means creating regression model for group levels. For example, if we have a dependent variable y and the independent variable x also a grouping variable G that divides the combination of x and y into multiple groups then we can create a linear regression model for each of the group. In R, we can convert data frame to data.table object, this will help us to create the regression models easily.

Example

 Live Demo

Consider the below data frame −

G1<−sample(LETTERS[1:4],20,replace=TRUE)
x1<−rnorm(20,2,0.96)
y1<−rnorm(20,5,1)
df1<−data.frame(G1,x1,y1)
df1

Output

   G1    x1    y1
1 C 1.2692290 3.994126
2 C 1.6317682 4.474443
3 D 1.3686734 5.444823
4 D 2.4969567 5.818360
5 C 2.3882221 3.766412
6 A 2.7568873 5.506297
7 A 2.1352764 4.548771
8 B 2.5232049 5.378314
9 A 2.8695959 4.735447
10 C −0.2317400 5.280478
11 A 1.1473469 5.064822
12 A 2.9099241 4.090654
13 A 2.4095434 6.538454
14 C 2.5310162 7.137598
15 A 2.4097431 4.778472
16 C 0.4945313 5.511772
17 C 1.3427334 5.030479
18 A 1.5200120 6.758618
19 A 2.4414779 5.854175
20 B −0.6968409 4.594522

Loading data.table package and converting data frame df1 to a data.table object −

library(data.table)
df1<−data.table(df1)

Creating linear regression model groups defined in column G1 −

df1[,as.list(coef(lm(y1 ~ x1))), by=G1]

Output

   G1 (Intercept) x1
1: C 4.959098 0.05109642
2: D 4.991700 0.33106700
3: A 6.536957 -0.53189331
4: B 4.764140 0.24341026

Let’s have a look at another example −

Class<−sample(c("I","II","III"),20,replace=TRUE)
Ratings<−sample(1:10,20,replace=TRUE)
Salary<−sample(20000:50000,20)
df2<−data.frame(Class,Ratings,Salary)
df2

Output

Class Ratings Salary
1 I 4 28423
2 III 1 34728
3 II 1 26975
4 I 9 26777
5 II 6 29501
6 I 8 33061
7 II 4 43584
8 I 4 42525
9 II 9 30526
10 I 1 32872
11 I 7 21198
12 I 3 20971
13 III 9 49071
14 I 1 40314
15 III 1 36269
16 I 6 45482
17 II 1 48595
18 I 8 44054
19 I 1 25294
20 III 10 34944
df2<−data.table(df2)

Creating regression models of Salary and Ratings for the three Classes −

df2[,as.list(coef(lm(Salary~Ratings))),by=Class]

Output

Class (Intercept) Ratings
1: I 31894.13 194.9152
2: III 35270.10 663.4089
3: II 40405.42 -1087.9103

Updated on: 07-Nov-2020

4K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements