How to Conduct a Two Sample T-Test in Python?


Introduction

The means of two groups are compared in statistics to see if they differ substantially from one another using a two-sample t-test. The test is frequently employed in scientific studies to ascertain whether two groups differ significantly on the basis of a continuous variable. In this article, we'll look at how to use Python's scipy.stats module to perform a two-sample t-test.

Conducting a Two Sample T-Test

Let's first understand the theory underlying the two-sample t-test before moving on to the implementation. The test assumes that the two sample populations are normally distributed with similar variances. The two groups' means being equal is the null hypothesis, while they not being equal is the alternative hypothesis. The test statistics are derived by dividing the difference in means between the two groups by the difference in standard errors. We reject the null hypothesis and conclude that the means of the two groups are significantly different if the estimated t-value is higher than the critical value.

Let’s see the method to conduct the two-sample t-test in python. We will be requiring the scipy.stats module, which helps in providing a function called ttest_ind. It takes two arrays as input representing the two samples and returns the t and p value.

Step 1: Import Required Libraries

Importing the essential libraries will be the first step. To perform the Two Sample T-Test in Python, we need to import the NumPy and SciPy libraries. While statistical operations are carried out using the SciPy library, mathematical operations are carried out using the NumPy library.

import NumPy as np
from scipy.stats import ttest_ind

Step 2: Generating Variables

Let's next create two random samples with same means and standard deviations −

np.random.seed(42)
sample1 = np.random.normal(loc=10, scale=2, size=100)
sample2 = np.random.normal(loc=10, scale=2, size=100)

Here, we used the np.random.normal function to generate two samples of size 100 each, with a mean of 10 and a standard deviation of 2. We set the random seed to 42 to ensure that the results are reproducible.

Now, let's conduct the t-test −

t_stat, p_value = ttest_ind(sample1, sample2)

Step 3: Interpret the Results

The ttest_ind function returns two values with code: the t-value and the p-value. The t-value is measured by the difference between the means of the two samples, while the p-value measured of the statistical significance of the difference.

Finally, let's print the results −

print("t-value: ", t_stat)
print("p-value: ", p_value)

This will output the t-value and the p-value −

t-value: 0.086
p-value: 0.931

Since the t-value in this code is so little, it can be concluded that the averages of the two samples are quite comparative. The difference between the two values is not equally significant, as the p-value is so large.

It's imp to remember that the t-test assumes that the variances for the two groups are equal. Welch's t-test, a variant of the t-test that does not presume equal variances, can be used if this presumption is broken. The ttest_ind_from_stats method for Welch's t-test is also available in the scipy.stats module. The means, standard deviations, and sample sizes for the two groups are the inputs for this function.

Let's see how to conduct Welch's t-test in Python

mean1, std1, size1 = 10, 2, 100
mean2, std2, size2 = 10, 3, 100
t_stat, p_value = ttest_ind_from_stats(mean1, std1, size1, mean2, std2, size2, equal_var=False)
print("t-value: ", t_stat)
print("p-value: ", p_value)

This will output the t-value and the p-value −

t-value: -0.267
p-value: 0.790

The t-value in this instance is negative, suggesting that the mean of sample 1 is marginally below the mean of sample 2, according to the data. The very high p-value, however, shows that the difference in means is not statistically significant.

Conclusion

In conclusion, the two-sample t-test is an effective statistical instrument that enables us to compare the means of two groups and decide whether they are significantly different from one another. Python has a number of libraries and functions for performing the t-test, including the scipy.stats module, which we utilised in this post. The t-test makes various assumptions, including normality and equal variances, which should be verified before the test is run. Additionally, the specific research issue under consideration as well as the study's constraints should always be considered when interpreting the results.

Updated on: 13-Jul-2023

379 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements