Article Categories

Selected Reading

How To Calculate Studentized Residuals In Python?

Python Machine Learning Programming Scripts

Studentized residuals are typically used in regression analysis to identify potential outliers in the data. An outlier is a point that is significantly different from the overall trend of the data, and it can have a significant influence on the fitted model. By identifying and analyzing outliers, you can better understand the underlying patterns in your data and improve the accuracy of your model.

What are Studentized Residuals?

The term "studentized residuals" refers to a particular class of residuals that have had their standard deviations divided by an estimate. Regression analysis residuals are used to describe the discrepancy between the response variable's observed values and its model-generated anticipated values. To find probable outliers in the data that can significantly affect the fitted model, studentized residuals are employed.

The following formula is typically used to calculate studentized residuals ?

studentized residual = residual / (standard deviation of residuals * (1 - hii)^(1/2))

where "residual" refers to the discrepancy between the observed and anticipated response values, "standard deviation of residuals" refers to an estimate of the residuals' standard deviation, and "hii" refers to the leverage factor for each data point.

Calculating Studentized Residuals Using statsmodels

The statsmodels package can be used to compute studentized residuals in Python. The syntax is as follows ?

OLSResults.outlier_test()

Where OLSResults refers to a linear model that was fitted using statsmodels' ols() method.

Complete Example

Here's a complete example demonstrating how to calculate studentized residuals ?

# Import necessary packages and functions
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create dataset
df = pd.DataFrame({'rating': [95, 82, 92, 90, 97, 85, 80, 70, 82, 83],
                   'points': [22, 25, 17, 19, 26, 24, 9, 19, 11, 16]})

# Fit simple linear regression model
model = ols('rating ~ points', data=df).fit()

# Calculate studentized residuals
stud_res = model.outlier_test()

# Display studentized residuals
print(stud_res)

    student_resid   unadj_p   bonf(p)
0        1.048218  0.329376  1.000000
1       -1.018535  0.342328  1.000000
2        0.994962  0.352896  1.000000
3        0.548454  0.600426  1.000000
4        1.125756  0.297380  1.000000
5       -0.465472  0.655728  1.000000
6       -0.029670  0.977158  1.000000
7       -2.940743  0.021690  0.216903
8        0.100759  0.922567  1.000000
9       -0.134123  0.897080  1.000000

Plotting Studentized Residuals

We can visualize the studentized residuals by plotting them against the predictor variable values ?

import matplotlib.pyplot as plt

# Define predictor variable values and studentized residuals
x = df['points']
y = stud_res['student_resid']

# Create scatterplot of predictor variable vs. studentized residuals
plt.scatter(x, y)
plt.axhline(y=0, color='black', linestyle='--')
plt.xlabel('Points')
plt.ylabel('Studentized Residuals')
plt.title('Studentized Residuals vs Points')
plt.show()

Interpreting the Results

The output contains three important columns:

student_resid: The studentized residual values
unadj_p: Unadjusted p-values for outlier test
bonf(p): Bonferroni-adjusted p-values

Values with absolute studentized residuals greater than 2 or 3 are typically considered potential outliers. In our example, observation 7 has a studentized residual of -2.94, indicating it may be an outlier.

Conclusion

Studentized residuals are essential for identifying outliers and influential observations in regression analysis. Use outlier_test() from statsmodels to calculate these values and visualize them to assess model assumptions and data quality.

Jay Singh

Updated on: 2026-03-26T23:27:20+05:30

1K+ Views

Previous Next