Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How To Perform An Ancova In Python?
ANCOVA (Analysis of Covariance) is a statistical method that combines ANOVA with regression analysis. It compares group means while controlling for the effects of continuous variables called covariates, providing more accurate group comparisons by adjusting for confounding variables.
What is ANCOVA?
ANCOVA extends traditional ANOVA by including one or more continuous covariates in the model. This allows researchers to:
Control for variables that might influence the dependent variable
Reduce error variance and increase statistical power
Make more precise comparisons between groups
For example, when testing a new blood pressure medication, you might want to control for age since it naturally affects blood pressure. ANCOVA lets you compare treatment groups while adjusting for age differences.
Implementing ANCOVA in Python
Python's statsmodels library provides tools for performing ANCOVA analysis using the OLS (Ordinary Least Squares) method.
Basic Syntax
from statsmodels.formula.api import ols
# Basic ANCOVA formula
model = ols('dependent_variable ~ group + covariate', data=df).fit()
Complete Example
Here's a complete example analyzing the effect of different treatments while controlling for a covariate ?
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
# Create sample data
data = {
'score': [8, 7, 9, 11, 10, 12, 14, 13, 15, 16],
'treatment': ["A", "A", "A", "B", "B", "B", "C", "C", "C", "C"],
'age': [20, 30, 40, 30, 40, 50, 40, 50, 60, 70]
}
df = pd.DataFrame(data)
print("Sample Data:")
print(df.head())
# Perform ANCOVA
model = ols('score ~ treatment + age', data=df).fit()
# Display results
print("\nANCOVA Results:")
print(model.summary())
Sample Data:
score treatment age
0 8 A 20
1 7 A 30
2 9 A 40
3 11 B 30
4 10 B 40
ANCOVA Results:
OLS Regression Results
==============================================================================
Dep. Variable: score R-squared: 0.939
Model: OLS Adj. R-squared: 0.909
Method: Least Squares F-statistic: 31.00
Date: Mon, 01 Jan 2024 Prob (F-statistic): 0.000476
Time: 12:00:00 Log-Likelihood: -10.724
No. Observations: 10 AIC: 29.45
Df Residuals: 6 BIC: 30.66
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 6.0000 1.054 5.692 0.001 3.421 8.579
treatment[T.B] 2.3333 0.805 2.898 0.027 0.363 4.303
treatment[T.C] 4.8333 1.032 4.684 0.003 2.308 7.358
age 0.0667 0.030 2.191 0.071 -0.008 0.141
==============================================================================
Interpreting ANCOVA Results
The key components of ANCOVA output include:
| Component | Interpretation |
|---|---|
| R-squared | Proportion of variance explained by the model |
| F-statistic | Overall model significance |
| Coefficients | Effect size for each group and covariate |
| P-values | Statistical significance of each effect |
Checking ANCOVA Assumptions
ANCOVA requires several assumptions to be met for valid results ?
import matplotlib.pyplot as plt
import numpy as np
# Check residuals for normality and homoscedasticity
residuals = model.resid
fitted_values = model.fittedvalues
# Plot residuals vs fitted values
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.scatter(fitted_values, residuals)
plt.axhline(y=0, color='red', linestyle='--')
plt.xlabel('Fitted Values')
plt.ylabel('Residuals')
plt.title('Residuals vs Fitted')
plt.subplot(1, 2, 2)
plt.hist(residuals, bins=5, alpha=0.7)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Residual Distribution')
plt.tight_layout()
plt.show()
print("Residuals mean:", np.mean(residuals))
print("Residuals std:", np.std(residuals))
Residuals mean: -1.4210854715202004e-14 Residuals std: 0.8944271909999159
Conclusion
ANCOVA is a powerful statistical method that combines ANOVA and regression to compare group means while controlling for covariates. Using Python's statsmodels, you can easily perform ANCOVA analysis and interpret results to make more accurate conclusions about group differences.
