How To Perform An Ancova In Python?

Python Machine Learning Programming Scripts

ANCOVA (analysis of covariance) is a useful statistical method because it enables the inclusion of covariates in the analysis, which may assist adjust for auxiliary variables and increase the precision of group comparisons. These additional factors, or covariates, may be incorporated into the study using ANCOVA. In order to be sure that any observed differences between the groups are caused by the therapy or intervention under study and not by unrelated factors, ANCOVA can be used to adjust for the impact of the covariates on the group means. This can make the comparisons between the groups more accurate and give more solid conclusions regarding the connections between the variables. In this post, we will be closely looking at ANCOVA and implementing it in python.

What is an ANCOVA?

The analysis of covariance (ANCOVA) approach compares the means of two or more groups while adjusting for the effects of one or more continuous variables (called covariates). ANCOVA is similar to ANOVA (analysis of variance), but it permits variables to be included in the model. As a result, it is a valuable tool for assessing the effects of these factors on group means and generating more accurate comparisons between groups.

Consider the following scenario? you are running research to assess the efficacy of a new blood pressure medicine. You gather blood pressure data from a group of people who take the medicine and a group of people who do not take the medication, as well as age data for each participant in the research. You might compare the means of the two groups on the dependent variable (blood pressure) while adjusting for the effects of the covariate (age) on the group means using ANCOVA. This would allow you to establish if the medicine is successful at decreasing blood pressure while taking into account any age variations between the groups.

Implementing ANCOVA in Python

Consider the following ANCOVA in Python performed using the statsmodels module ?

Syntax

df = pd.DataFrame({'dependent_variable' : [8, 7, 9, 11, 10, 12, 14, 13, 15, 16],
   'group' : ["A", "A", "A", "B", "B", "B", "C", "C", "C", "C"],
   'covariate' : [20, 30, 40, 30, 40, 50, 40, 50, 60, 70]})

model = ols('dependent_variable ~ group + covariate', data=df).fit()

Utilizing Python's statsmodels module, one can conduct an ANCOVA (analysis of covariance). An analysis of covariance (ANCOVA) is a statistical method for comparing the means of two or more groups while adjusting for the effects of one or more continuous variables (called covariates).

Algorithm

Importing Pandas and statsmodel.api
Defining data for Ancova
Performing Ancova operation
Printing the summary of the model

Example

Using the scikit?posthocs lib to run Dunn's test is demonstrated here ?


import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Define the data for the ANCOVA
df = pd.DataFrame({'dependent_variable' : [8, 7, 9, 11, 10, 12, 14, 13, 15, 16],
   'group' : ["A", "A", "A", "B", "B", "B", "C", "C", "C", "C"],
    'covariate' : [20, 30, 40, 30, 40, 50, 40, 50, 60, 70]})

# Perform the ANCOVA
model = ols('dependent_variable ~ group + covariate', data=df).fit()

# Print the summary of the model
print(model.summary())

Output

                           OLS Regression Results                            
==============================================================================
Dep. Variable:     dependent_variable   R-squared:                       0.939
Model:                            OLS   Adj. R-squared:                  0.909
Method:                 Least Squares   F-statistic:                     31.00
Date:                Fri, 09 Dec 2022   Prob (F-statistic):           0.000476
Time:                        09:52:28   Log-Likelihood:                -10.724
No. Observations:                  10   AIC:                             29.45
Df Residuals:                       6   BIC:                             30.66
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      6.0000      1.054      5.692      0.001       3.421       8.579
group[T.B]     2.3333      0.805      2.898      0.027       0.363       4.303
group[T.C]     4.8333      1.032      4.684      0.003       2.308       7.358
covariate      0.0667      0.030      2.191      0.071      -0.008       0.141
==============================================================================
Omnibus:                        2.800   Durbin-Watson:                   2.783
Prob(Omnibus):                  0.247   Jarque-Bera (JB):                1.590
Skew:                          -0.754   Prob(JB):                        0.452
Kurtosis:                       1.759   Cond. No.                         201.

The estimated coefficients for the group and covariate variables, together with their p-values and confidence ranges, will all be included in the output of this code. This data can be used to compare the group means while accounting for the effects of the covariate and to assess the importance of the group and covariate variables in the model.

Overall, the statsmodels module gives Python users a strong and adaptable tool for doing ANCOVA. It makes it simple to create, test, analyze, and comprehend ANCOVA models as well as their output.

Conclusion

Finally, ANCOVA (analysis of covariance) is a statistical approach for comparing the means of two or more groups while adjusting for the effects of one or more continuous variables (called covariates). ANCOVA is similar to ANOVA (analysis of variance), but it permits variables to be included in the model. As a result, it is a valuable tool for assessing the effects of these factors on group means and generating more accurate comparisons between groups. It is widely used in various study domains, including psychology, biology, and economics, to assess the impact of covariates on group averages and to draw more precise conclusions regarding variable correlations.

Jay Singh

Updated on: 2022-12-28T10:21:50+05:30

5K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started