StatsCalculators.com

Two Sample Paired t-Test

This Paired T-Test Calculator helps you compare two related groups or repeated measurements to determine if there are statistically significant differences between them. For example, you could compare before and after measurements in a weight loss study, or test scores before and after an intervention. The calculator performs comprehensive statistical analysis including descriptive statistics, hypothesis testing, and automatically checks normality assumptions. It also generates publication-ready APA format reports. To learn about the data format required and test this calculator, click here to populate the sample data.

Calculator

1. Load Your Data

2. Select Columns & Options

Learn More

Paired T-Test

Definition

Paired T-Test is a statistical test used to compare two related/dependent samples to determine if there is a significant difference between their means. It's particularly useful when measurements are taken from the same subject before and after a treatment, or when subjects are matched pairs.

Formula

Test Statistic:

t=dˉsd/nt = \frac{\bar{d}}{s_d/\sqrt{n}}

Degrees of freedom:

df=n1df = n - 1

Confidence Intervals:

Two-sided confidence interval:

CI=dˉ±tα/2,n1sdnCI = \bar{d} \pm t_{\alpha/2, n-1} \cdot \frac{s_d}{\sqrt{n}}

One-sided confidence intervals:

CI=dˉ±tα,n1sdnCI = \bar{d} \pm t_{\alpha, n-1} \cdot \frac{s_d}{\sqrt{n}}

Where:

  • dˉ\bar{d} = mean difference between paired observations
  • sds_d = standard deviation of the differences
  • nn = number of pairs

Key Assumptions

Paired Observations: Each data point in one sample has a matched pair in the other sample
Normality: The differences between pairs should be approximately normally distributed
Independence: The pairs should be independent of each other

Practical Example

Testing the effectiveness of a weight loss program by measuring participants' weights before and after the program:

Given Data:

  • Before weights (kg): 70, 75, 80, 85, 90
  • After weights (kg): 68, 72, 77, 82, 87
  • Differences (After - Before): -2, -3, -3, -3, -3
  • α=0.05\alpha = 0.05 (two-tailed test)

Hypotheses:

Null Hypothesis (H0H_0): μd=0\mu_d = 0 (no difference between before and after)

Alternative Hypothesis (H1H_1): μd0\mu_d \neq 0 (there is a difference)

Step-by-Step Calculation:

  1. Calculate mean difference: dˉ=2.8\bar{d} = -2.8
  2. Calculate standard deviation of differences: sd=0.447s_d = 0.447
  3. Degrees of freedom: df=4df = 4
  4. Calculate t-statistic: t=2.80.447/5=14.0t = \frac{-2.8}{0.447/\sqrt{5}} = -14.0
  5. Critical value: t0.025,4=±2.776t_{0.025,4} = \pm 2.776
  6. Confidence interval: dˉ±tα/2,dfsdn=2.8±2.7760.4475=[3.2,2.4]\bar{d} \pm t_{\alpha/2,df} \cdot \frac{s_d}{\sqrt{n}} = -2.8 \pm 2.776 \cdot \frac{0.447}{\sqrt{5}} = [-3.2, -2.4]

Conclusion:

14.0>2.776|-14.0| > 2.776, we reject the null hypothesis. There is sufficient evidence to conclude that the weight loss program resulted in a significant change in participants' weights (p<0.05p < 0.05). We are 95% confident that the true mean difference lies between -3.2 and -2.4 kg.

Effect Size

Cohen's d for paired samples:

d=dˉsdd = \frac{|\bar{d}|}{s_d}

Interpretation guidelines:

  • Small effect: d0.2\text{Small effect: }|d| \approx 0.2
  • Medium effect: d0.5\text{Medium effect: }|d| \approx 0.5
  • Large effect: d0.8\text{Large effect: }|d| \approx 0.8

Power Analysis

Required sample size (n) for desired power (1-β):

n=(z1α/2+z1β)2σd2Δ2n = \frac{(z_{1-\alpha/2} + z_{1-\beta})^2\sigma_d^2}{\Delta^2}

Where:

  • α\alpha = significance level
  • β\beta = probability of Type II error
  • σd\sigma_d = standard deviation of differences
  • Δ\Delta = minimum detectable difference

Decision Rules

Reject H0H_0 if:

  • Two-sided test: t>tα/2,n1|t| > t_{\alpha/2,n-1}
  • Left-tailed test: t<tα,n1t < -t_{\alpha,n-1}
  • Right-tailed test: t>tα,n1t > t_{\alpha,n-1}
  • Or if p-value<αp\text{-value} < \alpha

Reporting Results

Standard format for scientific reporting:

"A paired-samples t-test was conducted to compare [variable] before and after [treatment]. Results indicated that [treatment] produced a [significant/non-significant] difference in scores from [before] (M = [mean1], SD = [sd1]) to [after] (M = [mean2], SD = [sd2]), t([df]) = [t-value], p = [p-value], d = [Cohen's d]. The mean difference was [diff] (95% CI: [lower] to [upper])."

Code Examples

R
library(tidyverse)
library(car)
library(effsize)

set.seed(42)
n <- 30
baseline <- rnorm(n, mean = 100, sd = 15)
followup <- baseline + rnorm(n, mean = -5, sd = 5)  # Average decrease of 5 units

# Create data frame
data <- tibble(
  subject = 1:n,
  baseline = baseline,
  followup = followup,
  difference = followup - baseline
)

# Basic summary
summary_stats <- data %>%
  summarise(
    mean_diff = mean(difference),
    sd_diff = sd(difference),
    n = n()
  )

# Paired t-test
t_test_result <- t.test(data$followup, data$baseline, paired = TRUE)

# Effect size
cohens_d <- mean(data$difference) / sd(data$difference)

# Visualization
ggplot(data) +
  geom_point(aes(x = baseline, y = followup)) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_minimal() +
  labs(title = "Baseline vs Follow-up Measurements",
       subtitle = paste("Mean difference:", round(mean(data$difference), 2)))
Python
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.stats.power import TTestPower

# Generate example data
np.random.seed(42)
n = 30
baseline = np.random.normal(100, 15, n)
followup = baseline + np.random.normal(-5, 5, n)
differences = followup - baseline

# Basic statistics
mean_diff = np.mean(differences)
sd_diff = np.std(differences, ddof=1)
se_diff = sd_diff / np.sqrt(n)

# Paired t-test
t_stat, p_value = stats.ttest_rel(followup, baseline)

# Effect size
cohens_d = mean_diff / sd_diff

# Power analysis
analysis = TTestPower()
power = analysis.power(effect_size=cohens_d, 
                      nobs=n,
                      alpha=0.05)

# Visualization
plt.figure(figsize=(12, 5))

# Scatterplot
plt.subplot(1, 2, 1)
plt.scatter(baseline, followup)
min_val = min(baseline.min(), followup.min())
max_val = max(baseline.max(), followup.max())
plt.plot([min_val, max_val], [min_val, max_val], '--', color='red')
plt.xlabel('Baseline')
plt.ylabel('Follow-up')
plt.title('Baseline vs Follow-up')

# Differences histogram
plt.subplot(1, 2, 2)
sns.histplot(differences, kde=True)
plt.axvline(mean_diff, color='red', linestyle='--')
plt.xlabel('Differences (Follow-up - Baseline)')
plt.title('Distribution of Differences')

plt.tight_layout()
plt.show()

print(f"Mean difference: {mean_diff:.2f}")
print(f"Standard deviation of differences: {sd_diff:.2f}")
print(f"t-statistic: {t_stat:.2f}")
print(f"p-value: {p_value:.4f}")
print(f"Cohen's d: {cohens_d:.2f}")
print(f"Statistical Power: {power:.4f}")

Alternative Tests

Consider these alternatives when assumptions are violated:

  • Wilcoxon Signed-Rank Test: When normality of differences is violated or data is ordinal
  • Independent t-test: When samples are independent rather than paired

Verification

Related Calculators

Help us improve

Found an error or have a suggestion? Let us know!