 
 Data Structure Data Structure
 Networking Networking
 RDBMS RDBMS
 Operating System Operating System
 Java Java
 MS Excel MS Excel
 iOS iOS
 HTML HTML
 CSS CSS
 Android Android
 Python Python
 C Programming C Programming
 C++ C++
 C# C#
 MongoDB MongoDB
 MySQL MySQL
 Javascript Javascript
 PHP PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Differentiate between categorical and numerical independent variables in R.
For categorical variable, each level is considered as an independent variable and is recognized by factor function. On the other hand, the numerical independent variable is either continuous or discrete in nature.
Check out the Example given below for linear regression model summary to understand the difference between categorical and numerical independent variables.
Example
Following snippet creates a sample data frame −
x<-rpois(20,2) y<-rpois(20,5) df<-data.frame(x,y) df
The following dataframe is created
x y 1 1 1 2 4 5 3 3 10 4 3 4 5 1 6 6 3 4 7 1 2 8 1 10 9 1 6 10 2 5 11 1 2 12 3 4 13 0 5 14 1 5 15 4 5 16 4 7 17 3 5 18 2 4 19 1 3 20 2 6
To create linear model for data in df and find the model summary on the above created data frame, add the following code to the above snippet −
x<-rpois(20,2) y<-rpois(20,5) df<-data.frame(x,y) Model_1<-lm(y~x,data=df) summary(Model_1)
Output
If you execute all the above given snippets as a single program, it generates the following Output −
Call:
lm(formula = y ~ x, data = df)
Residuals:
   Min     1Q  Median    3Q   Max
-3.549 -1.313  -0.503 1.128 5.451
Coefficients:
         Estimate Std. Error t value Pr(|t|)
(Intercept) 4.168      1.013   4.11  0.00065 ***
x           0.382      0.426   0.90  0.38249
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.29 on 18 degrees of freedom
Multiple R-squared: 0.0426, Adjusted R-squared: -0.0106
F-statistic: 0.801 on 1 and 18 DF, p-value: 0.382
To create linear model for data in df with as a factor variable and find the model summary on the above created data frame, add the following code to the above snippet −
x<-rpois(20,2) y<-rpois(20,5) df<-data.frame(x,y) Model_1<-lm(y~x,data=df) Model_2<-lm(y~factor(x),data=df) summary(Model_2)
Output
If you execute all the above given snippets as a single program, it generates the following Output −
Call:
lm(formula = y ~ factor(x), data = df)
Residuals:
   Min     1Q  Median    3Q   Max
-3.375 -1.400  -0.533 1.083 5.625
Coefficients:
           Estimate Std.   Error t value   Pr(|t|)
(Intercept) 5.00e+00     2.50e+00  2.00    0.064 .
factor(x)1 -6.25e-01     2.65e+00 -0.24    0.817
factor(x)2 -3.92e-15     2.89e+00  0.00    1.000
factor(x)3  4.00e-01     2.74e+00  0.15    0.886
factor(x)4  6.67e-01     2.89e+00  0.23    0.820
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.5 on 15 degrees of freedom
Multiple R-squared: 0.0526, Adjusted R-squared: -0.2
F-statistic: 0.208 on 4 and 15 DF, p-value: 0.93