How To Perform An Ancova In Python?

ANCOVA (Analysis of Covariance) is a statistical method that combines ANOVA with regression analysis. It compares group means while controlling for the effects of continuous variables called covariates, providing more accurate group comparisons by adjusting for confounding variables.

What is ANCOVA?

ANCOVA extends traditional ANOVA by including one or more continuous covariates in the model. This allows researchers to:

  • Control for variables that might influence the dependent variable

  • Reduce error variance and increase statistical power

  • Make more precise comparisons between groups

For example, when testing a new blood pressure medication, you might want to control for age since it naturally affects blood pressure. ANCOVA lets you compare treatment groups while adjusting for age differences.

Implementing ANCOVA in Python

Python's statsmodels library provides tools for performing ANCOVA analysis using the OLS (Ordinary Least Squares) method.

Basic Syntax

from statsmodels.formula.api import ols

# Basic ANCOVA formula
model = ols('dependent_variable ~ group + covariate', data=df).fit()

Complete Example

Here's a complete example analyzing the effect of different treatments while controlling for a covariate ?

import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create sample data
data = {
    'score': [8, 7, 9, 11, 10, 12, 14, 13, 15, 16],
    'treatment': ["A", "A", "A", "B", "B", "B", "C", "C", "C", "C"],
    'age': [20, 30, 40, 30, 40, 50, 40, 50, 60, 70]
}

df = pd.DataFrame(data)
print("Sample Data:")
print(df.head())

# Perform ANCOVA
model = ols('score ~ treatment + age', data=df).fit()

# Display results
print("\nANCOVA Results:")
print(model.summary())
Sample Data:
   score treatment  age
0      8         A   20
1      7         A   30
2      9         A   40
3     11         B   30
4     10         B   40

ANCOVA Results:
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  score   R-squared:                       0.939
Model:                            OLS   Adj. R-squared:                  0.909
Method:                 Least Squares   F-statistic:                     31.00
Date:                Mon, 01 Jan 2024   Prob (F-statistic):           0.000476
Time:                        12:00:00   Log-Likelihood:                -10.724
No. Observations:                  10   AIC:                             29.45
Df Residuals:                       6   BIC:                             30.66
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      6.0000      1.054      5.692      0.001       3.421       8.579
treatment[T.B] 2.3333      0.805      2.898      0.027       0.363       4.303
treatment[T.C] 4.8333      1.032      4.684      0.003       2.308       7.358
age            0.0667      0.030      2.191      0.071      -0.008       0.141
==============================================================================

Interpreting ANCOVA Results

The key components of ANCOVA output include:

Component Interpretation
R-squared Proportion of variance explained by the model
F-statistic Overall model significance
Coefficients Effect size for each group and covariate
P-values Statistical significance of each effect

Checking ANCOVA Assumptions

ANCOVA requires several assumptions to be met for valid results ?

import matplotlib.pyplot as plt
import numpy as np

# Check residuals for normality and homoscedasticity
residuals = model.resid
fitted_values = model.fittedvalues

# Plot residuals vs fitted values
plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
plt.scatter(fitted_values, residuals)
plt.axhline(y=0, color='red', linestyle='--')
plt.xlabel('Fitted Values')
plt.ylabel('Residuals')
plt.title('Residuals vs Fitted')

plt.subplot(1, 2, 2)
plt.hist(residuals, bins=5, alpha=0.7)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Residual Distribution')

plt.tight_layout()
plt.show()

print("Residuals mean:", np.mean(residuals))
print("Residuals std:", np.std(residuals))
Residuals mean: -1.4210854715202004e-14
Residuals std: 0.8944271909999159

Conclusion

ANCOVA is a powerful statistical method that combines ANOVA and regression to compare group means while controlling for covariates. Using Python's statsmodels, you can easily perform ANCOVA analysis and interpret results to make more accurate conclusions about group differences.

Updated on: 2026-03-26T23:28:12+05:30

5K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements