How to Conduct a Paired Samples T-Test


Introduction

In machine learning and data science, many statistical tests are used to compare and find the differences between variables or the features of the data. These tests are mainly hypothesis tests where the conditions are defined, and according to the different tests being conducted, the relationship between variables is assumed. The t-test is also a type of statistical test that is used to compare the means of different groups of the categorical variable.

In this article, we will discuss the paired t-test, which is an extension or a type of t-test used in statistics, and we will discuss the procedure for conducting the same. This article will help one to understand the intuition behind the paired t-test and will be able to conduct the same whenever necessary.

So let us start with the basic one—the t-test.

What is T-Test?

The t-test is a type of statistical test that is used to compare the means of different groups of categorical variables. It is used in order to check whether there is a difference between the groups or not. Sometimes it is also used for feature selection, where the hypothesis is defined, and according to the acceptance and rejection of the hypothesis based on p and t values, the features are selected and rejected for the model training.

In the t-test, we basically take the mean of different groups or categories of the categorical variables and then compare it to check for the difference.

Here the t valley is also calculated, which is then compared with the critical t value, and if the calculated t value is greater than the critical t value, the null hypothesis is rejected, and it is assumed that there is a difference between the means of different groups that are being compared.

Now in the case of the normal t-test, the individual means of different groups are calculated, which is then used to calculate the t value for the test, which helps in conducting the hypothesis testing, and based on that, the null and alternate hypotheses are rejected and accepted.

But in some cases, we need to check for the rate of change of the variables, or we have paired observations where the data is collected from the same category; in such cases, the paired sample t-test is used.

Let us discuss those in detail in the next section.

What is Paired Sample T-Test?

The paired Samples t-test is also a type of t-test that is used to compare the means of the different groups, but here the difference in the means is calculated instead of calculating the individual means of the group.

In simple words, it is the test that is used in the case of paired samples, and in case we want to study the rate of change of mean between two groups from the same variable. It calculates the difference between the means o the groups, and then the t value is calculated.

In short, the paired t-test is used when we have paired or related groups of the categorical variables, which are the results of some action, event, or intervention in the data and are related through something.

Whereas the normal t-test is used when we have two independent groups of categorical variables which are not related to each other in any way.

Now let us discuss the workflow for conducting the paired t-test.

Workflow for Conducting the Paired T-Test

Let us discuss the various steps involved in conducting the paired t-test step by step.

Define Hypothesis

The first steps in conducting any hypothesis test are to define the hypothesis first. Here the null and alternate hypotheses are defined, and they are accepted and rejected based on the t value, which we get at the end of the test.

Gather the Paired Data

As we are conducting the paired t-test in this case, the data here would be the paired data, or the data samples will be paired and collected from the same category of the event. The data may be gathered from the same object or with the same subject under different time intervals.

Compute the Differences

Now for each pair of the observation, we will calculate the difference in the values of different groups. So here, for both groups, we will have a certain value for a certain index of the observation; the difference between these values is calculated for all the observations.

Find the mean of the differences

Now as we have the difference between the values of the groups' observations, we will take the means of these differences. Also, the standard deviation will also be calculated in this step.

Find the T Value

In this step, the t value is found with the help of the following formula:

T - Mean Difference - Hypiothesized Difference/sqrt (S^2/n)

Find the Critical T values

The next step is to find the critical value for the t. Here the degree of freedom and the significance level is used in order to get the critical t value for the samples.

Interpret the Result

Now compare the results of the test; here, the normal calculated t value and the critical t values are calculated, and if the calculated t value is greater than the critical t value, then the null hypothesis is rejected.

Example for Conducting Paired T-Test

Now let us take a code example to understand the paired t-test more clearly. Here we will use a dummy dataset with 500 observations, and we will conduct the normal and paired t-test both on the dataset.

import numpy as np
from scipy import stats
np.random.seed(42)

group_a = np.random.normal(loc=10, scale=2, size=500)
group_b = np.random.normal(loc=12, scale=2, size=500)

# Normal t-test
t_stat, p_value = stats.ttest_ind(group_a, group_b)

# Paired t-test
paired_diff = group_b - group_a
t_stat_paired, p_value_paired = stats.ttest_rel(group_b, group_a)

# The results
print("Normal t-test:")
print("t-statistic:", t_stat)
print("p-value:", p_value)

print("
Paired t-test:") print("t-statistic:", t_stat_paired) print("p-value:", p_value_paired)

Output

Normal t-test:
t-statistic: -16.54353366592559
p-value: 1.638349016942478e-54

Paired t-test:
t-statistic: 15.951028260754956
p-value: 1.3798771823104818e-46

The above code conducts the paired and normal t-test for the sample data and prints the results with the t value and the p values for the same. These values can then be used for the hypothesis testing.

Conclusion

In this article, we discussed the t-test and the paired t-test, what the do test means, when they are used, and what are their major application with, discussing the workflow and code examples of the same. This article will help one to understand the paired t-test more clearly and will help one to conduct the same to compare different groups of the variable.

Updated on: 17-Aug-2023

145 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements