Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to Perform an F-Test in Python
Statisticians use F-test to check whether the two datasets have the same variance or not. F-test is named after Sir Ronald Fisher. To use the F-Test, we make two hypotheses: a null hypothesis and one alternate hypothesis. Then we select any of these two hypotheses based on the F-Test results.
Variance is a data distribution metric that measures data deviation from the mean. Higher values show more dispersion than smaller values.
In this article, you will learn how to perform an F-Test in Python programming language with its use cases.
F-Test Process
The process to perform the F-Test is as follows ?
-
To begin with, define the null and alternate hypotheses.
Null Hypothesis or H?: ??² = ??² (the variances of the populations are equal)
Alternate Hypothesis or H?: ??² ? ??² (the variances of the populations are unequal)
Choose the statistic for testing.
Calculate the degrees of freedom for the populations. For instance, if m and n are population sizes, the degrees of freedom are denoted as (df1) = m?1 and (df2) = n?1 respectively.
Now find the F value from the F-table.
At last, divide the value of alpha by 2 for two-tailed tests to calculate the critical value.
Thus, we define the F value using the degrees of freedom of the populations. We read the df1 in the first row while df2 in the first column.
There are various F Tables for unique kinds of degrees of freedom. We compare the F statistic from step 2 with the critical value calculated in step 4. Then we can reject the null hypothesis if the critical value is lesser than the F statistic. On the contrary, we can accept the null hypothesis when the critical value is greater than the F statistic at some significant level.
Assumptions
We make some assumptions before performing the F-Test based on the dataset ?
The data populations follow the normal distribution (i.e., they fit the bell curve).
Samples are independent of each other (i.e., no correlation between samples).
Apart from these assumptions, we should also consider the following key points while performing the F-Test ?
The maximum variance value should be in the numerator to perform the right-tailed test.
Determine the critical value after dividing alpha by 2 in the case of the two-tailed test.
Check if you have variance or standard deviations.
If you do not have degrees of freedom in the F Table, then go with the maximum value as the critical value.
F-Test in Python
Syntax
scipy.stats.f()
Parameters
x: quantiles
q: lower or upper tail probability
dfn, dfd: shape parameters (degrees of freedom)
loc: location parameter
scale: scale parameter (default=1)
size: random variate shape
moments: ['mvsk'] letters, specifying which moments to compute
Example
Let's perform an F-test to compare variances of two groups ?
import numpy as np
import scipy.stats
# Create sample data
group1 = [0.28, 0.2, 0.26, 0.28, 0.5]
group2 = [0.2, 0.23, 0.26, 0.21, 0.23]
# Converting the lists to arrays
x = np.array(group1)
y = np.array(group2)
# Calculate the variance of each group
var1 = np.var(group1, ddof=1)
var2 = np.var(group2, ddof=1)
print(f"Variance of group1: {var1}")
print(f"Variance of group2: {var2}")
def f_test(group1, group2):
# Calculate F-statistic (ratio of variances)
f_stat = np.var(group1, ddof=1) / np.var(group2, ddof=1)
# Degrees of freedom
df1 = len(group1) - 1
df2 = len(group2) - 1
# Calculate p-value
p_value = 1 - scipy.stats.f.cdf(f_stat, df1, df2)
return f_stat, p_value
# Perform F-test
f_statistic, p_val = f_test(group1, group2)
print(f"F-statistic: {f_statistic:.5f}")
print(f"P-value: {p_val:.6f}")
# Decision based on significance level (? = 0.05)
alpha = 0.05
if p_val < alpha:
print("Reject null hypothesis: Variances are significantly different")
else:
print("Accept null hypothesis: Variances are not significantly different")
Variance of group1: 0.01308 Variance of group2: 0.00053 F-statistic: 24.67925 P-value: 0.019127 Reject null hypothesis: Variances are significantly different
Using scipy.stats.f_oneway for Multiple Groups
For comparing variances across multiple groups, you can use f_oneway ?
import numpy as np
from scipy.stats import f_oneway
# Create three sample groups
group1 = [23, 25, 28, 30, 32]
group2 = [20, 22, 24, 26, 28]
group3 = [18, 20, 22, 24, 26]
# Perform one-way F-test
f_stat, p_value = f_oneway(group1, group2, group3)
print(f"F-statistic: {f_stat:.4f}")
print(f"P-value: {p_value:.6f}")
if p_value < 0.05:
print("Groups have significantly different means")
else:
print("Groups do not have significantly different means")
F-statistic: 10.5000 P-value: 0.001895 Groups have significantly different means
Conclusion
The F-test is a powerful statistical tool for comparing variances between populations. Use scipy.stats.f.cdf() to calculate p-values and make decisions about variance equality based on your chosen significance level.
