Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to Conduct a Two Sample T-Test in Python?
The two-sample t-test is a statistical method used to compare the means of two independent groups to determine if they differ significantly. This test is commonly used in scientific research to analyze whether two groups differ on a continuous variable. In this article, we'll explore how to perform a two-sample t-test in Python using the scipy.stats module.
Understanding Two-Sample T-Test
Before implementing the test, let's understand the theory. The two-sample t-test assumes that both sample populations are normally distributed with similar variances. The null hypothesis states that the means of the two groups are equal, while the alternative hypothesis states they are not equal. The test statistic is calculated by dividing the difference between group means by the standard error of the difference.
If the calculated t-value exceeds the critical value, we reject the null hypothesis and conclude that the group means are significantly different.
Step 1: Import Required Libraries
First, we need to import the necessary libraries. We'll use NumPy for mathematical operations and SciPy for statistical functions ?
import numpy as np from scipy.stats import ttest_ind
Step 2: Generate Sample Data
Let's create two random samples with the same means and standard deviations for demonstration ?
import numpy as np
from scipy.stats import ttest_ind
# Set random seed for reproducibility
np.random.seed(42)
# Generate two samples with same parameters
sample1 = np.random.normal(loc=10, scale=2, size=100)
sample2 = np.random.normal(loc=10, scale=2, size=100)
print(f"Sample 1 mean: {sample1.mean():.3f}")
print(f"Sample 2 mean: {sample2.mean():.3f}")
Sample 1 mean: 9.973 Sample 2 mean: 10.013
Step 3: Conduct the T-Test
Now let's perform the two-sample t-test using the ttest_ind() function ?
import numpy as np
from scipy.stats import ttest_ind
np.random.seed(42)
sample1 = np.random.normal(loc=10, scale=2, size=100)
sample2 = np.random.normal(loc=10, scale=2, size=100)
# Perform the t-test
t_stat, p_value = ttest_ind(sample1, sample2)
print(f"t-statistic: {t_stat:.3f}")
print(f"p-value: {p_value:.3f}")
# Interpret results
alpha = 0.05
if p_value < alpha:
print("Reject null hypothesis: means are significantly different")
else:
print("Fail to reject null hypothesis: no significant difference")
t-statistic: -0.086 p-value: 0.931 Fail to reject null hypothesis: no significant difference
Welch's T-Test for Unequal Variances
When the assumption of equal variances is violated, we can use Welch's t-test by setting equal_var=False ?
import numpy as np
from scipy.stats import ttest_ind
# Generate samples with different variances
np.random.seed(42)
sample1 = np.random.normal(loc=10, scale=2, size=100)
sample2 = np.random.normal(loc=12, scale=3, size=100)
# Welch's t-test (unequal variances)
t_stat, p_value = ttest_ind(sample1, sample2, equal_var=False)
print(f"Welch's t-statistic: {t_stat:.3f}")
print(f"p-value: {p_value:.3f}")
# Check for significance
alpha = 0.05
if p_value < alpha:
print("Significant difference between groups")
else:
print("No significant difference between groups")
Welch's t-statistic: -5.421 p-value: 0.000 Significant difference between groups
Complete Example with Real-World Interpretation
Here's a complete example comparing test scores between two teaching methods ?
import numpy as np
from scipy.stats import ttest_ind
# Simulate test scores for two teaching methods
np.random.seed(123)
method_a_scores = np.random.normal(loc=75, scale=10, size=50)
method_b_scores = np.random.normal(loc=80, scale=12, size=45)
print("Teaching Method Comparison:")
print(f"Method A - Mean: {method_a_scores.mean():.2f}, Std: {method_a_scores.std():.2f}")
print(f"Method B - Mean: {method_b_scores.mean():.2f}, Std: {method_b_scores.std():.2f}")
# Perform two-sample t-test
t_stat, p_value = ttest_ind(method_a_scores, method_b_scores)
print(f"\nT-test Results:")
print(f"t-statistic: {t_stat:.3f}")
print(f"p-value: {p_value:.3f}")
# Decision
alpha = 0.05
if p_value < alpha:
print(f"\nConclusion: Significant difference (p < {alpha})")
if t_stat < 0:
print("Method B shows significantly higher scores")
else:
print("Method A shows significantly higher scores")
else:
print(f"\nConclusion: No significant difference (p ? {alpha})")
Teaching Method Comparison: Method A - Mean: 75.34, Std: 9.49 Method B - Mean: 79.44, Std: 12.96 T-test Results: t-statistic: -1.743 p-value: 0.085 Conclusion: No significant difference (p ? 0.05)
Key Assumptions and Considerations
Before conducting a two-sample t-test, ensure these assumptions are met:
- Independence: Observations in each group must be independent
- Normality: Data should be approximately normally distributed
- Equal variances: Use standard t-test; if violated, use Welch's t-test
Conclusion
The two-sample t-test is a powerful tool for comparing group means in Python using scipy.stats.ttest_ind(). Use equal_var=False for Welch's t-test when variances are unequal. Always check assumptions and interpret p-values in context of your significance level.
