How to create a polynomial model in R?


Most of the times the explanatory variables are not linearly related to the response variable and we need to find the best model for our data. In this type of situations, we move on to polynomial models to check whether they will be helpful in determining the accuracy of the predictions. This can be done by using power of the independent variables in lm function.

Example

Consider the below data frame −

> set.seed(99)
> x1<-rnorm(30,0.5)
> x2<-rpois(30,5)
> x3<-runif(30,2,5)
> x4<-rnorm(30,0.8)
> y<-rpois(30,10)
> df<-data.frame(x1,x2,x3,x4,y)

Creating a model with 2 degree of variable x1 −

> PolynomialModel1<-lm(y~x1+I(x1^2)+x2+x3+x4)
> summary(PolynomialModel1)
Call:
lm(formula = y ~ x1 + I(x1^2) + x2 + x3 + x4)
Residuals:
Min 1Q Median 3Q Max
-4.6890 -1.5544 -0.5614 1.6872 5.1347
Coefficients:
         Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.72141    2.25262 5.647 8.16e-06 ***
x1       0.61879 0.51927  1.192 0.245
I(x1^2) -0.45597 0.36046 -1.265 0.218
x2      -0.22389 0.25613 -0.874 0.391
x3      -0.05005 0.56085 -0.089 0.930
x4      -0.46588 0.67529 -0.690 0.497
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.499 on 24 degrees of freedom
Multiple R-squared: 0.1979, Adjusted R-squared: 0.03079
F-statistic: 1.184 on 5 and 24 DF, p-value: 0.3461

Creating a model with 2 degree and 3 degree of variable x1 −

> PolynomialModel2<-lm(y~x1+I(x1^2)+I(x1^3)+x2+x3+x4)
> summary(PolynomialModel2)
Call:
lm(formula = y ~ x1 + I(x1^2) + I(x1^3) + x2 + x3 + x4)
Residuals:
Min 1Q Median 3Q Max
-4.7600 -1.5965 -0.6293 1.6855 5.0326
Coefficients:
        Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.69112 2.30315 5.510 1.33e-05 ***
x1       0.39753 1.16291  0.342 0.736
I(x1^2) -0.40674 0.43399 -0.937 0.358
I(x1^3)  0.07242 0.33881  0.214 0.833
x2      -0.21837 0.26265 -0.831 0.414
x3      -0.01952 0.58989 -0.033 0.974
x4      -0.54635 0.78526 -0.696 0.494
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.55 on 23 degrees of freedom
Multiple R-squared: 0.1995, Adjusted R-squared: -0.00935
F-statistic: 0.9552 on 6 and 23 DF, p-value: 0.4764

Creating a model with 2 degree and 3 degree of variable x1, and 2 degree of variable x2 −

> PolynomialModel3<-lm(y~x1+I(x1^2)+I(x1^3)+x2+I(x2^2)+x3+x4)
> summary(PolynomialModel3)
Call:
lm(formula = y ~ x1 + I(x1^2) + I(x1^3) + x2 + I(x2^2) + x3 +
x4)
Residuals:
Min 1Q Median 3Q Max
-4.4688 -1.5123 -0.5659 1.5657 5.2208
Coefficients:
     Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.26835 2.81745 4.709 0.000107 ***
x1       0.44131 1.19123  0.370 0.714577
I(x1^2) -0.39980 0.44277 -0.903 0.376322
I(x1^3)  0.05274 0.34941  0.151 0.881391
x2      -0.67626 1.26441 -0.535 0.598124
I(x2^2)  0.05114 0.13801  0.371 0.714527
x3       0.03889 0.62160  0.063 0.950677
x4      -0.49947 0.81036 -0.616 0.543985
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.599 on 22 degrees of freedom
Multiple R-squared: 0.2044, Adjusted R-squared: -0.04868
F-statistic: 0.8077 on 7 and 22 DF, p-value: 0.5901

Creating a model with 2 degree and 3 degree of variable x1, 2 degree of variable x2, and 3 degree of variable x4 −

> PolynomialModel4<-lm(y~x1+I(x1^2)+I(x1^3)+x2+I(x2^2)+x3+I(x4^3))
> summary(PolynomialModel4)
Call:
lm(formula = y ~ x1 + I(x1^2) + I(x1^3) + x2 + I(x2^2) + x3 +
I(x4^3))
Residuals:
Min 1Q Median 3Q Max
-4.1388 -1.5998 -0.4581 1.6871 5.2185
Coefficients:
        Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.156294 2.829809 4.649 0.000124 ***
x1       0.522760 1.160777  0.450 0.656862
I(x1^2) -0.440464 0.437798 -1.006 0.325310
I(x1^3)  0.014329 0.329379  0.044 0.965692
x2      -0.658946 1.277395 -0.516 0.611104
I(x2^2)  0.048228 0.139822  0.345 0.733428
x3       0.002062 0.613597  0.003 0.997349
I(x4^3) -0.104330 0.192868 -0.541 0.593985
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.604 on 22 degrees of freedom
Multiple R-squared: 0.2013, Adjusted R-squared: -0.05279
F-statistic: 0.7923 on 7 and 22 DF, p-value: 0.6016

Updated on: 10-Aug-2020

175 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements