Article Categories

Selected Reading

Interpreting Linear Regression Results using OLS Summary

Machine Learning Python Numpy

Linear regression analyzes the relationship between one or more independent variables and a dependent variable, helping you understand how changes in predictors affect the outcome. Statsmodels is a comprehensive Python library that provides extensive statistical modeling capabilities, including linear regression with detailed summary outputs.

The OLS (Ordinary Least Squares) summary from Statsmodels contains crucial information about model performance, coefficient estimates, statistical significance, and diagnostic metrics. Let's explore how to interpret each component ?

Model Fit Statistics

The first section focuses on overall model performance ?

R-squared (R²) ? Measures the proportion of variance in the dependent variable explained by independent variables. Values range from 0 (poor fit) to 1 (perfect fit).
Adjusted R-squared ? Adjusts R² for sample size and number of predictors, providing a more conservative estimate that penalizes unnecessary variables.
F-statistic ? Tests the overall significance of the model by determining if all independent variables collectively explain the dependent variable better than chance.

Coefficient Analysis

Each independent variable has several key statistics ?

Coefficients ? Represent the estimated slope for each independent variable, showing the strength and direction of the relationship with the dependent variable.
Standard Errors ? Measure uncertainty around each coefficient estimate. Larger standard errors indicate less precise estimates.
t-values ? Calculated by dividing the coefficient by its standard error. Absolute t-values greater than 2 typically suggest statistical significance.
p-values ? Indicate the probability of observing the coefficient if no true relationship exists. Values below 0.05 typically suggest statistical significance.

Diagnostic Tests

Additional metrics help evaluate model assumptions ?

Durbin-Watson statistic ? Tests for autocorrelation in residuals. Values near 2 indicate no significant autocorrelation.
Omnibus Test ? Evaluates normality of residuals. Lower p-values suggest deviations from normality.
Jarque-Bera Test ? Another normality test for residuals. Lower p-values indicate non-normal distribution.
Condition Number ? Assesses multicollinearity between independent variables. Large values indicate high correlation among predictors.

Example Implementation

import numpy as np
import pandas as pd
import statsmodels.api as sm

# Create sample data
np.random.seed(42)
n = 100
X1 = np.random.randn(n)
X2 = np.random.randn(n)
X3 = np.random.randn(n)

# Create dependent variable with known relationships
y = 2 + 3*X1 + 1.5*X2 - 0.8*X3 + np.random.randn(n)*0.5

# Create DataFrame
data = pd.DataFrame({'X1': X1, 'X2': X2, 'X3': X3, 'y': y})

# Prepare variables
X = data[['X1', 'X2', 'X3']]
y = data['y']

# Add constant for intercept
X = sm.add_constant(X)

# Fit OLS model
model = sm.OLS(y, X).fit()

# Display summary
print(model.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.956
Model:                            OLS   Adj. R-squared:                  0.955
Method:                 Least Squares   F-statistic:                     736.4
Date:                Mon, 01 Jan 2024   Prob (F-statistic):          2.34e-67
Time:                        12:00:00   Log-Likelihood:                -72.58
No. Observations:                 100   AIC:                             153.2
Df Residuals:                      96   BIC:                             163.6
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.0158      0.049     41.004      0.000       1.918       2.113
X1             2.9842      0.051     58.808      0.000       2.884       3.085
X2             1.4789      0.049     30.084      0.000       1.381       1.577
X3            -0.7856      0.052    -15.221      0.000      -0.888      -0.683
==============================================================================
Omnibus:                        0.234   Durbin-Watson:                   2.087
Prob(Omnibus):                  0.890   Jarque-Bera (JB):                0.353
Skew:                          -0.109   Prob(JB):                        0.838
Kurtosis:                       2.826   Cond. No.                         1.05
==============================================================================

Interpreting Key Results

From this example summary ?

R-squared = 0.956 ? The model explains 95.6% of variance in y
F-statistic p-value < 0.001 ? Model is statistically significant
All coefficient p-values < 0.001 ? All predictors are significant
Durbin-Watson ? 2 ? No autocorrelation detected
Low condition number ? No multicollinearity issues

Conclusion

Interpreting OLS summary results involves examining R-squared for model fit, coefficient p-values for variable significance, and diagnostic tests for assumption violations. Understanding these metrics enables you to assess model reliability and make informed decisions based on regression analysis.

---

Bhavani Vangipurapu

Updated on: 2026-03-27T15:29:33+05:30

1K+ Views

Previous Next