- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Differentiate between categorical and numerical independent variables in R.
For categorical variable, each level is considered as an independent variable and is recognized by factor function. On the other hand, the numerical independent variable is either continuous or discrete in nature.
Check out the Example given below for linear regression model summary to understand the difference between categorical and numerical independent variables.
Example
Following snippet creates a sample data frame −
x<-rpois(20,2) y<-rpois(20,5) df<-data.frame(x,y) df
The following dataframe is created
x y 1 1 1 2 4 5 3 3 10 4 3 4 5 1 6 6 3 4 7 1 2 8 1 10 9 1 6 10 2 5 11 1 2 12 3 4 13 0 5 14 1 5 15 4 5 16 4 7 17 3 5 18 2 4 19 1 3 20 2 6
To create linear model for data in df and find the model summary on the above created data frame, add the following code to the above snippet −
x<-rpois(20,2) y<-rpois(20,5) df<-data.frame(x,y) Model_1<-lm(y~x,data=df) summary(Model_1)
Output
If you execute all the above given snippets as a single program, it generates the following Output −
Call: lm(formula = y ~ x, data = df) Residuals: Min 1Q Median 3Q Max -3.549 -1.313 -0.503 1.128 5.451 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 4.168 1.013 4.11 0.00065 *** x 0.382 0.426 0.90 0.38249 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.29 on 18 degrees of freedom Multiple R-squared: 0.0426, Adjusted R-squared: -0.0106 F-statistic: 0.801 on 1 and 18 DF, p-value: 0.382
To create linear model for data in df with as a factor variable and find the model summary on the above created data frame, add the following code to the above snippet −
x<-rpois(20,2) y<-rpois(20,5) df<-data.frame(x,y) Model_1<-lm(y~x,data=df) Model_2<-lm(y~factor(x),data=df) summary(Model_2)
Output
If you execute all the above given snippets as a single program, it generates the following Output −
Call: lm(formula = y ~ factor(x), data = df) Residuals: Min 1Q Median 3Q Max -3.375 -1.400 -0.533 1.083 5.625 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 5.00e+00 2.50e+00 2.00 0.064 . factor(x)1 -6.25e-01 2.65e+00 -0.24 0.817 factor(x)2 -3.92e-15 2.89e+00 0.00 1.000 factor(x)3 4.00e-01 2.74e+00 0.15 0.886 factor(x)4 6.67e-01 2.89e+00 0.23 0.820 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.5 on 15 degrees of freedom Multiple R-squared: 0.0526, Adjusted R-squared: -0.2 F-statistic: 0.208 on 4 and 15 DF, p-value: 0.93
- Related Articles
- Correlation Between Categorical and Continuous Variables
- How to subset an R data frame based on numerical and categorical column?
- Create scatterplot for two dependent variables and one independent variable in R.
- How to visualize two categorical variables together in R?
- How to detect multicollinearity in categorical variables using R?
- How to convert multiple numerical variables to factor variable in R?
- How to standardize only numerical columns in an R data frame if categorical columns also exist?
- How to change the order of independent variables for regression summary output in R?
- How to extract p-values for intercept and independent variables of a general linear model in R?
- How to find the mean of a numerical column by two categorical columns in an R data frame?
- How to plot categorical variables in Matplotlib?
- How to divide row values of a numerical column based on categorical column values in an R data frame?
- How to count the number of rows for a combination of categorical variables in R?
- How can we identify independent and dependent variables in an algebraic equation?
- How do we identify dependent variable and independent variables in linear graph ?
