Worksheet 17: Paired Sample Inference

Learning Objectives 🎯

Understand the rationale for paired designs and when they are advantageous over independent samples
Transform paired data problems into one-sample problems by analyzing within-pair differences
Verify the variance relationship for paired data: $\text{Var}(D) = \text{Var}(X_1) + \text{Var}(X_2) - 2\text{Cov}(X_1, X_2)$
Check assumptions for paired t-procedures and assess normality of differences
Conduct complete hypothesis tests using paired t-procedures and construct confidence bounds
Calculate sample size for paired designs using power analysis

Introduction

In the previous worksheet, you explored inference methods for two independent sample procedures. Now, we turn our attention to scenarios in which observations in two groups are systematically linked, or where a single set of subjects or units is measured under two conditions. These are known as paired-sample designs, and they often yield greater precision and statistical power by controlling for variability specific to each unit.

Key Concept 💡

The fundamental insight of paired analysis: Instead of comparing two separate groups, we analyze the differences within each pair, effectively transforming a two-sample problem into a simpler one-sample problem.

When Do We Use Paired Designs?

A paired design arises in any situation where we can sensibly treat observations in one sample as corresponding directly to observations in another sample. Common examples include:

Before-and-After Measurement: The same subject or unit is measured at two time points, such as pre-test vs. post-test.
Matched Subjects: Separate subjects or units in each group are matched on key characteristics (age, gender, baseline condition, etc.), making them as similar as possible.
Cross-Over or Repeated Measures: Each subject or unit experiences both treatments, one at a time, with the order randomized or balanced.

Why Paired Data?

By pairing observations, we effectively remove or reduce the background variability caused by differences across subjects (or units). Instead of comparing raw values from two separate groups, we analyze differences within each pair. This approach typically yields:

More Precision: The subject-to-subject (or unit-to-unit) variation is factored out, allowing differences of interest to emerge more clearly.
Smaller Sample Size Requirements: Because the analysis is more sensitive, fewer subjects may be needed to achieve similar power compared to an independent-sample design.

Note

Efficiency Gain from Pairing

If measurements within pairs are positively correlated (which is typical when the same subject is measured twice or when subjects are well-matched), the paired design will be more powerful than an independent design with the same total number of observations.

Potential Drawbacks of Paired Designs

Although paired designs can reduce within-subject variability and increase statistical power, they do have limitations:

Logistical Challenges: Measuring each unit under both conditions or finding near-identical matches can be time-consuming or impractical.
Forced Matching Issues: Matching on suboptimal criteria may introduce bias.
Missing Data: Missing data for one member of a pair often means discarding both observations.
Low Correlation: If the paired measurements have low (or negative) correlation, the advantages of pairing diminish.
Generalizability: Carefully matching or controlling for certain factors can reduce the broader applicability of the findings.

Part 1: Theory and Procedure for Paired Samples

Mathematical Framework

Let $(X_{1i}, X_{2i})$ represent the $i$-th paired observation where $i = 1, 2, \ldots, n$, and:

$X_{1i}$ is the observation from condition or treatment A
$X_{2i}$ is the observation from condition or treatment B

The paired-sample procedure transforms a two-sample problem into a one-sample problem by analyzing the differences within each pair:

\[D_i = X_{1i} - X_{2i}\]

Thus, we now treat $D_1, D_2, \ldots, D_n$ as a single sample of size $n$, and proceed just like a one-sample problem.

Once these differences are computed, we perform inference on the mean of these differences, denoted:

\[\mu_D = E[D_i] = E[X_{1i} - X_{2i}] = \mu_1 - \mu_2\]

Hence, inference about $\mu_D$ is equivalent to inference about $\mu_1 - \mu_2$.

Warning

Direction Matters!

When defining $D_i = X_{1i} - X_{2i}$, be consistent throughout your analysis. The sign of your test statistic and confidence interval depends on this choice. Always clearly state which group you’re subtracting from which.

Assumptions Underlying the Paired t-Procedure

Paired Observations: Each $D_i$ is formed by linking the observations $X_{1i}$ and $X_{2i}$. We assume the pairing is correct and that no pair is “forced” inappropriately.
Independence Across Pairs: The $n$ difference values $D_1, D_2, \ldots, D_n$ must be mutually independent. That is, knowledge of $D_i$ does not provide information about $D_j$ for $i \neq j$.
Normality (or Large n) of Differences: The $D_i$ come from an approximately normally distributed population, or the sample size $n$ is large enough for the Central Limit Theorem to make $\bar{D}$ approximately normally distributed.

Note

Important: No assumption is needed about the distributions of $X_{1i}$ or $X_{2i}$ individually. We only require that the differences $D_i$ are (approximately) normal or that $n$ is sufficient to justify normality of the mean difference.

In other words, we assume that the measurements $D_1, D_2, \ldots, D_n$ form an SRS from $N(\mu_D, \sigma_D^2)$.

Paired t-Test Formulas Summary 📐

Differences: $D_i = X_{1i} - X_{2i}$ for $i = 1, 2, \ldots, n$

Parameter of Interest: $\mu_D = \mu_1 - \mu_2$

Degrees of Freedom: $df = n - 1$

Test Statistic:

\[T_{TS} = \frac{\bar{D} - \Delta_0}{S_D / \sqrt{n}}\]

where $\bar{D} = \frac{1}{n}\sum_{i=1}^n D_i$ and $S_D = \sqrt{\frac{1}{n-1}\sum_{i=1}^n (D_i - \bar{D})^2}$

(Note: $\Delta_0$ is the hypothesized value under $H_0$, typically 0)

100(1 - α)%-level Confidence Intervals/Bounds:

Upper Confidence Bound:

\[\mu_D < \bar{D} + t_{\alpha, n-1} \cdot \frac{S_D}{\sqrt{n}}\]

Two-sided Confidence Interval:

\[\bar{D} \pm t_{\alpha/2, n-1} \cdot \frac{S_D}{\sqrt{n}}\]

Lower Confidence Bound:

\[\mu_D > \bar{D} - t_{\alpha, n-1} \cdot \frac{S_D}{\sqrt{n}}\]

R Code:

# Paired t-test
t.test(group1, group2, paired = TRUE)

# Or equivalently:
differences <- group1 - group2
t.test(differences, mu = 0)

The Variance Relationship for Paired Data

An important relationship connects the variance of the differences to the individual variances and covariance:

\[\text{Var}(D) = \text{Var}(X_1 - X_2) = \text{Var}(X_1) + \text{Var}(X_2) - 2\text{Cov}(X_1, X_2)\]

Or equivalently, in terms of standard deviations and correlation:

\[\sigma_D^2 = \sigma_1^2 + \sigma_2^2 - 2\rho\sigma_1\sigma_2\]

where $\rho = \text{Cor}(X_1, X_2)$ is the correlation coefficient.

This relationship shows why pairing is beneficial: when $\rho > 0$ (positive correlation), the variance of differences is smaller than if the observations were independent, leading to more precise inference.

Variance Relationship for Paired Data 📐

Key Formula:

\[\text{Var}(D) = \text{Var}(X_1 - X_2) = \text{Var}(X_1) + \text{Var}(X_2) - 2\text{Cov}(X_1, X_2)\]

In terms of standard deviations:

\[\sigma_D^2 = \sigma_1^2 + \sigma_2^2 - 2\sigma_1\sigma_2\rho\]

where $\rho = \text{Cor}(X_1, X_2) = \frac{\text{Cov}(X_1, X_2)}{\sigma_1\sigma_2}$

Sample estimate:

\[S_D^2 = S_1^2 + S_2^2 - 2\text{Cov}(X_1, X_2)\]

Why pairing helps:

When $\rho > 0$: $\text{Var}(D) < \text{Var}(X_1) + \text{Var}(X_2)$ → Smaller variance, more precision!
When $\rho = 0$: $\text{Var}(D) = \text{Var}(X_1) + \text{Var}(X_2)$ → No advantage to pairing
When $\rho < 0$: $\text{Var}(D) > \text{Var}(X_1) + \text{Var}(X_2)$ → Pairing actually hurts!

Part 2: Sleep Study Analysis

Historical Context 📚

The sleep dataset originates from a 1905 study by Arthur Cushny and A. Roy Peebles investigating the comparative effects of two optical isomers of the drug hyoscine (scopolamine) on sleep duration.

Each of the ten participants received both isomers separately:

Group 1: Dextro-isomer (D-hyoscine)
Group 2: Laevo-isomer (L-hyoscine)

Researchers employed a two-period crossover design, where each participant received both optical isomers in separate, randomized periods. They measured the additional sleep hours induced by each isomer compared to a baseline period with no drug.

Each participant thus served as their own control, forming matched pairs and allowing for precise evaluation of the directional hypothesis that the laevo-isomer provided a greater increase in sleep duration than the dextro-isomer.

Question 1: Analyzing the Sleep Dataset

You will conduct a complete analysis of this classic dataset, including data manipulation, assumption checking, hypothesis testing, and confidence interval construction.

Load the Data

First, load the sleep dataset from the base R package.

# Load the sleep dataset
data(sleep)

# Examine the structure
str(sleep)
head(sleep)

Restructure the Data The current data format is “long,” with each participant having two rows (one per group). For a paired analysis, pivot it into a “wide” format so that each row represents a unique participant with columns for both treatments.

After running this code, verify that you now have 10 rows (one per participant) with columns for both treatment groups.

Define and Compute the Differences

Define the difference within each pair symbolically. Be explicit about which group you’re subtracting from which, as this affects the interpretation of your results.

Your definition of $D_i$:

$D_i =$ ________________________

Now compute the differences in R and store them as a new variable in the sleep_wide dataset:
# Compute differences (define D_i = extra.2 - extra.1)
# This means: D_i = (Laevo) - (Dextro)
sleep_wide$diff <- sleep_wide$extra.2 - sleep_wide$extra.1

# View the differences
print(sleep_wide[, c("ID", "extra.1", "extra.2", "diff")])

Descriptive Statistics of Differences

Compute the sample mean and standard deviation of the differences and report them below:

# Sample mean and SD of differences
mean_diff <- mean(sleep_wide$diff)
sd_diff <- sd(sleep_wide$diff)

cat("Sample mean of differences:", mean_diff, "\n")
cat("Sample SD of differences:", sd_diff, "\n")

$\bar{D} =$ ____

$S_D =$ ____

Verify the Variance Relationship

Compute and report the sample means and sample standard deviations for each group, and compute the sample covariance between groups using the cov() function in R.

Results:

$\bar{X}_1 =$ ____ $S_1 =$ ____

$\bar{X}_2 =$ ____ $S_2 =$ ____

$\text{Cov}(X_1, X_2) =$ ____

Now verify that the following relationship holds using your data:

\[S_D^2 = S_1^2 + S_2^2 - 2\text{Cov}(X_1, X_2)\]

Show your calculation:

Verification:

$S_1^2 + S_2^2 - 2\text{Cov}(X_1, X_2) =$ ____

$S_D^2 =$ ____

Do they match (within rounding error)? ____

Why is this relationship important for understanding paired designs?

Check Assumptions

Clearly state and verify the necessary assumptions for conducting a paired t-test in the context of the sleep dataset analysis. Specifically, explain each assumption clearly in relation to this study, and indicate how you determined if these assumptions were reasonably satisfied.

How to check: Use graphical methods to assess normality of the differences.

library(ggplot2)
library(gridExtra)  # or library(patchwork)

# Calculate statistics for the differences
mean_diff <- mean(sleep_wide$diff)
sd_diff <- sd(sleep_wide$diff)
n <- nrow(sleep_wide)
n_bins <- max(round(sqrt(n)) + 2, 5)

cat("\nDescriptive Statistics for Differences:\n")
cat("Mean:", round(mean_diff, 3), "\n")
cat("SD:", round(sd_diff, 3), "\n")
cat("Median:", round(median(sleep_wide$diff), 3), "\n")
cat("n:", n, "\n")

# Create histogram with density overlays
hist_plot <- ggplot(sleep_wide, aes(x = diff)) +
geom_histogram(aes(y = after_stat(density)),
               bins = n_bins, fill = "grey", color = "black") +
geom_density(color = "red", linewidth = 1) +
stat_function(fun = dnorm,
               args = list(mean = mean_diff, sd = sd_diff),
               color = "blue", linewidth = 1) +
labs(title = "Distribution of Differences",
      x = "Difference (Group 2 - Group 1)",
      y = "Density") +
theme_minimal()

# Create Q-Q plot
qq_plot <- ggplot(sleep_wide, aes(sample = diff)) +
stat_qq(color = "steelblue") +
geom_abline(slope = sd_diff, intercept = mean_diff,
            color = "red", linewidth = 1) +
labs(title = "Q-Q Plot of Differences",
      x = "Theoretical Quantiles",
      y = "Sample Quantiles") +
theme_minimal()

# Display side-by-side
grid.arrange(hist_plot, qq_plot, ncol = 2,
            top = "Normality Assessment: Sleep Drug Differences")

Your assessment based on the plots and Shapiro-Wilk test:

Conclusion about assumptions:

Hypothesis Test

Using the sleep dataset, perform a complete four-step hypothesis test using the paired-sample hypothesis testing procedures at a significance level of $\alpha = 0.05$.

Step 1: Parameter and Hypotheses

Parameter of interest: $\mu_D =$ (describe in words) ________________________

where $D_i =$ ________________________

Research question: Does the laevo-isomer induce significantly more sleep than the dextro-isomer?

Hypotheses:

$H_0:$ ________________________

$H_a:$ ________________________

(Make sure your alternative hypothesis reflects the directional research question)

Step 2: Test Statistic

The test statistic for a paired t-test is:

\[t = \frac{\bar{D} - \Delta_0}{S_D / \sqrt{n}}\]

where $\Delta_0$ is the hypothesized difference under $H_0$ (typically 0).

Compute the test statistic using your data:

$t =$ ____

$df =$ ____

Step 3: p-value

Use R to compute the p-value:
```
# Paired t-test
# Specify alternative based on your hypotheses
t_test_result <- t.test(sleep_wide$extra.2, sleep_wide$extra.1,
                        paired = TRUE,
                        alternative = "greater",  # or "less" or "two.sided"
                        conf.level = 0.95)

print(t_test_result)

# Extract key values
cat("Test statistic:", t_test_result$statistic, "\n")
cat("Degrees of freedom:", t_test_result$parameter, "\n")
cat("p-value:", t_test_result$p.value, "\n")
```
$p\text{-value} =$ ____

Step 4: Decision and Conclusion

Decision at $\alpha = 0.05$:

Conclusion in context of the research question:

Confidence Bound

Manually compute the appropriate confidence bound (one-sided, since the test was directional) and use the bound to draw a conclusion about the hypothesis test.

Since this is a one-sided test (testing if laevo > dextro), compute a one-sided lower confidence bound:

\[\bar{D} - t_{\alpha, n-1} \cdot \frac{S_D}{\sqrt{n}}\]

Show your calculation:

# Critical value for one-sided test
t_crit <- qt(0.95, df = 9)  # 95% confidence, one-sided

# Lower confidence bound
lower_bound <- mean_diff - t_crit * (sd_diff / sqrt(10))

cat("Critical t-value:", t_crit, "\n")
cat("95% Lower confidence bound:", lower_bound, "\n")

# Interpretation
if (lower_bound > 0) {
  cat("Since the lower bound is > 0, we have evidence that μ_D > 0\n")
} else {
  cat("The lower bound includes 0, so we cannot conclude μ_D > 0\n")
}

Lower bound: ____

Interpretation using the confidence bound:

Does this agree with your hypothesis test conclusion?

Visualization Challenge 📊

Visualizing Paired Data

Create visualizations that effectively show the paired nature of the data and the treatment effect:

A before-after plot with lines connecting each participant’s measurements
A boxplot comparing the two treatments
A dot plot of the differences with a reference line at zero

Sample prompt for your AI assistant:

“Using the sleep dataset in R, create a before-after plot showing the extra sleep for each participant under both treatments (Dextro and Laevo). Connect paired observations with lines, color-code by whether the difference is positive or negative, and add appropriate labels. Make the visualization publication-ready.”

Part 3: Sample Size Calculation for Paired Designs

Question 2: Power Analysis for Cooling Method Study

A manufacturing engineer is evaluating a new cooling method designed to reduce the cooling time of molded plastic parts. Preliminary data from a pilot study have been gathered.

Study Design 🔬

In this pilot study, ten molded plastic parts were produced and then cut into two identical halves. One half of each pair was cooled using the traditional method (Treatment A), and the other half was cooled using the new cooling method (Treatment B). The engineer recorded the cooling times for each half.

Preliminary Data:

Mean cooling time (traditional method): $\bar{X}_A = 8.4$ minutes
Mean cooling time (new method): $\bar{X}_B = 7.5$ minutes
Standard deviation (traditional method): $s_A = 1.2$ minutes
Standard deviation (new method): $s_B = 1.0$ minutes
Sample correlation between paired cooling times: $r = 0.85$

Research Goal: Determine the minimal number of paired observations (sample size) required to achieve a statistical power of 95% at a significance level of $\alpha = 0.05$. Assume the standard deviation of the paired differences remains consistent with the preliminary data.

This is a one-sided test to show that the new cooling method reduces the cooling time by at least 0.5 minutes.

Step 1) Define the difference

First, clearly define how you will compute the differences. This affects the sign of your hypotheses.

Definition: $D_i =$ ________________________

(e.g., Traditional - New, or New - Traditional)

Step 2) State the Hypotheses

Based on your definition of $D_i$, state the null and alternative hypotheses for testing whether the new method reduces cooling time by at least 0.5 minutes.

$H_0:$ ________________________

$H_a:$ ________________________

Step 3) Sketch the Sampling Distributions Sketch out the situation using two side-by-side t-distributions for the sampling distribution of $\bar{D}$ under the null hypothesis and under the alternative.

Draw two side-by-side t-distributions showing:

The sampling distribution of $\bar{D}$ under $H_0$

The sampling distribution of $\bar{D}$ under $H_a$

Label the cutoff value, the rejection region, and the power region.

Step 4) Compute the Standard Deviation of Differences

$\sigma_D =$ ____

Step 5) Derive the Sample Size Formula

Use the code below to verify your sample size calculation. Note that you may need to change values and directions if you defined your difference as ${x_{A}}_i-{x_{B}}_i$

# Power verification for paired t-test

# Given values from preliminary data
sd_A <- 1.2
sd_B <- 1.0
r <- 0.85

# Compute SD of differences
var_D <- sd_A^2 + sd_B^2 - 2*r*sd_A*sd_B
sd_D <- sqrt(var_D)

# Test parameters
alpha <- 0.01
desired_power <- 0.95
beta <- 1-desired_power

# Define your hypotheses and effect size
Delta_0 <-  ___  # Hypothesized value under H0
Delta_alt <-  ___   # Expected true difference under


# YOUR calculated sample size (fill this in)
your_n <-  ___ # Enter your calculated n here


# Compute achieved power for your sample size
t_alpha <- qt(alpha, df = your_n-1, lower.tail = FALSE)


# Standard error with your n
SE <- sd_D / sqrt(your_n)

# cutoff value under H0
dbar_cutoff <- Delta_0-t_alpha*SE

# t-score of dbar_cutoff under Ha distribution
t_under_Ha <- (dbar_cutoff - Delta_alt) / SE

# Achieved power (note lower.tail is TRUE here graph it)
achieved_power <- pt(t_under_Ha,df = your_n-1, lower.tail = TRUE)

cat("Your sample size:", your_n, "pairs\n")
cat("SD of differences:", round(sd_D, 3), "\n")
cat("Standard error:", round(SE, 3), "\n")
cat("Achieved power:", round(achieved_power, 4), "\n")
cat("Target power:", desired_power, "\n\n")

# Verify power requirement
if (achieved_power >= desired_power - 0.01) {
cat("✓ Excellent! Your sample size achieves the target power of",
      desired_power, "\n")
if (achieved_power > desired_power + 0.05) {
   cat("  Note: Your n provides more power than required\n")
}
} else {
cat("✗ Your sample size achieves only", round(achieved_power, 3),
      "power\n")
cat("  You need a larger n to achieve", desired_power, "power\n")
cat("  Current shortfall:", round(desired_power - achieved_power, 3), "\n")
}

Minimum sample size needed: $n =$ ____ pairs

Interpretation

Based on your sample size calculation, provide practical recommendations to the manufacturing engineer.

Your recommendation:

What factors might cause you to recommend a larger sample size than your calculation suggests?

Note

Key Insight on Paired vs. Independent Designs

Notice how the correlation $\rho = 0.75$ between paired measurements affects the required sample size. If this were an independent design, the variance would be $\sigma_A^2 + \sigma_B^2$ (no covariance term), resulting in a larger required sample size. This demonstrates the efficiency gain from pairing when measurements are positively correlated!

Key Takeaways

Summary 📝

Pairing transforms the problem: Paired sample inference converts a two-sample comparison into a one-sample analysis of differences $D_i = X_{1i} - X_{2i}$, simplifying both the theory and computation.
The variance relationship is crucial: Understanding $\text{Var}(D) = \text{Var}(X_1) + \text{Var}(X_2) - 2\text{Cov}(X_1, X_2)$ reveals why pairing is beneficial when measurements are positively correlated.
Assumptions focus on differences: We only need the differences $D_i$ to be approximately normal (or $n$ large enough), not the original measurements themselves. Always check normality of differences graphically.
Consistency in defining differences: The direction of subtraction ($D_i = X_{1i} - X_{2i}$ vs. $D_i = X_{2i} - X_{1i}$) must be clearly stated and consistently applied throughout the analysis. It affects the sign of your test statistic and the interpretation of confidence intervals.
Power advantage of pairing: When within-pair correlation is positive and substantial, paired designs require smaller sample sizes than independent designs to achieve the same power. This makes them particularly valuable when observations are expensive or difficult to obtain.
R functions make it easy: The t.test() function with paired = TRUE handles all the calculations. However, understanding the underlying theory helps you set up the problem correctly and interpret results appropriately.

Reflection Questions 🤔

Before moving on, consider:

How would you decide whether to use a paired or independent design when planning a new study?
What happens to the efficiency of pairing if the correlation between measurements is zero? What if it’s negative?
Why is it important to check the normality of differences rather than the normality of each group separately?
In what situations might an independent design be preferable even if pairing is logistically possible?

Note

Connection to Previous Worksheets:

Worksheet 16: Two independent samples (pooled and Welch’s t-tests)
Worksheet 14-15: One-sample inference (paired samples reduce to this case)
The paired t-test uses the same t-distribution theory as one-sample tests

Preview of Next Topics:

Comparing more than two groups (ANOVA)