- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to perform group-wise linear regression for a data frame in R?
The group−wise linear regression means creating regression model for group levels. For example, if we have a dependent variable y and the independent variable x also a grouping variable G that divides the combination of x and y into multiple groups then we can create a linear regression model for each of the group. In R, we can convert data frame to data.table object, this will help us to create the regression models easily.
Example
Consider the below data frame −
G1<−sample(LETTERS[1:4],20,replace=TRUE) x1<−rnorm(20,2,0.96) y1<−rnorm(20,5,1) df1<−data.frame(G1,x1,y1) df1
Output
G1 x1 y1 1 C 1.2692290 3.994126 2 C 1.6317682 4.474443 3 D 1.3686734 5.444823 4 D 2.4969567 5.818360 5 C 2.3882221 3.766412 6 A 2.7568873 5.506297 7 A 2.1352764 4.548771 8 B 2.5232049 5.378314 9 A 2.8695959 4.735447 10 C −0.2317400 5.280478 11 A 1.1473469 5.064822 12 A 2.9099241 4.090654 13 A 2.4095434 6.538454 14 C 2.5310162 7.137598 15 A 2.4097431 4.778472 16 C 0.4945313 5.511772 17 C 1.3427334 5.030479 18 A 1.5200120 6.758618 19 A 2.4414779 5.854175 20 B −0.6968409 4.594522
Loading data.table package and converting data frame df1 to a data.table object −
library(data.table) df1<−data.table(df1)
Creating linear regression model groups defined in column G1 −
df1[,as.list(coef(lm(y1 ~ x1))), by=G1]
Output
G1 (Intercept) x1 1: C 4.959098 0.05109642 2: D 4.991700 0.33106700 3: A 6.536957 -0.53189331 4: B 4.764140 0.24341026
Let’s have a look at another example −
Class<−sample(c("I","II","III"),20,replace=TRUE) Ratings<−sample(1:10,20,replace=TRUE) Salary<−sample(20000:50000,20) df2<−data.frame(Class,Ratings,Salary) df2
Output
Class Ratings Salary 1 I 4 28423 2 III 1 34728 3 II 1 26975 4 I 9 26777 5 II 6 29501 6 I 8 33061 7 II 4 43584 8 I 4 42525 9 II 9 30526 10 I 1 32872 11 I 7 21198 12 I 3 20971 13 III 9 49071 14 I 1 40314 15 III 1 36269 16 I 6 45482 17 II 1 48595 18 I 8 44054 19 I 1 25294 20 III 10 34944 df2<−data.table(df2)
Creating regression models of Salary and Ratings for the three Classes −
df2[,as.list(coef(lm(Salary~Ratings))),by=Class]
Output
Class (Intercept) Ratings 1: I 31894.13 194.9152 2: III 35270.10 663.4089 3: II 40405.42 -1087.9103
Advertisements