10.3. Hypothesis Test for the Population Mean When σ Is Unknown

Hypothesis testing and confidence intervals are complementary tools that approach statistical inference from different angles. While hypothesis tests provide yes-or-no answers about specific parameter values, confidence intervals give us ranges of plausible values. Understanding their deep connection not only provides computational shortcuts but also reinforces the underlying logic of statistical inference.

Road Map 🧭

  • Problem we will solve – How hypothesis tests and confidence intervals/bounds are two sides of the same coin, and how to extend these relationships when σ is unknown

  • Tool we’ll learn – The duality principle connecting tests and intervals, then t-procedures when σ must be estimated

  • How it fits – This completes our understanding of inference for population means, showing how different approaches yield consistent conclusions

10.3.1. The Duality of Hypothesis Tests and Confidence Intervals

Two Perspectives on the Same Question

Confidence intervals and hypothesis tests address the same fundamental question from different angles:

  • Confidence intervals quantify uncertainty in our estimation of unknown population parameters, providing a region of plausible values for the truth

  • Hypothesis tests start from a specific assumption and assess whether our data provides sufficient evidence to reject that assumption

The key insight is that these approaches are mathematically equivalent under certain conditions. A confidence interval contains exactly those null hypothesis values that would not be rejected in a corresponding hypothesis test.

The Fundamental Duality Principle

For the duality to work, we need one crucial condition:

\[\text{Confidence Level} + \text{Significance Level} = 1\]

That is, \(C + \alpha = 1\), where \(C\) is the confidence coefficient and \(\alpha\) is the significance level.

When this condition holds: - If \(\mu_0\) lies inside the confidence interval → fail to reject \(H_0: \mu = \mu_0\) - If \(\mu_0\) lies outside the confidence interval → reject \(H_0: \mu = \mu_0\)

Why This Works

Both procedures use the same sampling distribution and the same critical values, just applied in different ways:

  • Confidence intervals ask: “What parameter values are consistent with this sample?”

  • Hypothesis tests ask: “Is this specific parameter value consistent with this sample?”

The mathematical machinery is identical—only the perspective changes.

10.3.2. Two-Sided Tests and Confidence Intervals, When σ Is Known

The Standard Case

For testing \(H_0: \mu = \mu_0\) versus \(H_a: \mu \neq \mu_0\) when \(\sigma\) is known, we use:

Hypothesis Test:

  • Test statistic: \(Z_{TS} = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\)

  • Decision rule: Reject \(H_0\) if \(|Z_{TS}| > z_{\alpha/2}\)

  • P-value: \(2P(Z > |Z_{TS}|)\)

Confidence Interval:

  • Formula: \(\left(\bar{x} - z_{\alpha/2} \frac{\sigma}{\sqrt{n}}, \bar{x} + z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\right)\)

  • Interpretation: We’re \(100(1-\alpha)\%\) confident the true mean lies in this interval

The Connection

Both procedures use the same critical value \(z_{\alpha/2}\) from the standard normal distribution. The test rejects when the observed sample mean is more than \(z_{\alpha/2}\) standard errors away from \(\mu_0\). The confidence interval includes all values that are within \(z_{\alpha/2}\) standard errors of the observed sample mean.

10.3.3. One-Sided Tests and Confidence Bounds, When σ Is Known

Right-Tailed Tests and Lower Bounds

For testing \(H_0: \mu \leq \mu_0\) versus \(H_a: \mu > \mu_0\):

Hypothesis Test:

  • Test statistic: \(Z_{TS} = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\)

  • Decision rule: Reject \(H_0\) if \(Z_{TS} > z_{\alpha}\)

  • P-value: \(P(Z > Z_{TS})\)

Lower Confidence Bound:

  • Formula: \(\mu > \bar{x} - z_{\alpha} \frac{\sigma}{\sqrt{n}}\)

  • Interpretation: We’re \(100(1-\alpha)\%\) confident the true mean exceeds this lower bound

The Logic: If we believe \(\mu > \mu_0\), then plausible values should extend upward from some lower threshold.

Left-Tailed Tests and Upper Bounds

For testing \(H_0: \mu \geq \mu_0\) versus \(H_a: \mu < \mu_0\):

Hypothesis Test:

  • Test statistic: \(Z_{TS} = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\)

  • Decision rule: Reject \(H_0\) if \(Z_{TS} < -z_{\alpha}\)

  • P-value: \(P(Z < Z_{TS})\)

Upper Confidence Bound:

  • Formula: \(\mu < \bar{x} + z_{\alpha} \frac{\sigma}{\sqrt{n}}\)

  • Interpretation: We’re \(100(1-\alpha)\%\) confident the true mean is below this upper bound

The Logic: If we believe \(\mu < \mu_0\), then plausible values should extend downward from some upper threshold.

Direction Principle

A helpful way to remember: the direction of the alternative hypothesis indicates the direction in which plausible values extend.

  • \(H_a: \mu > \mu_0\) → plausible values extend upward → need lower bound

  • \(H_a: \mu < \mu_0\) → plausible values extend downward → need upper bound

10.3.4. A Complete Example: Quality Control for Cherry Tomatoes

Let’s demonstrate the duality principle with a comprehensive example that shows both the confidence interval and hypothesis testing approaches.

The Scenario

Tom Green oversees quality control for a large produce company. He obtains a simple random sample of four packages of cherry tomatoes, each labeled 1/2 lb (227g). The average weight from Tom’s four packages is 222g. The packaging process has a known standard deviation of 5g, and package weights are normally distributed.

The Questions

  1. Construct a 95% confidence interval for the mean weight

  2. Test at \(\alpha = 0.05\) whether there’s evidence the machine needs revision (i.e., the mean differs from 227g)

Given Information

  • Sample size: \(n = 4\)

  • Sample mean: \(\bar{x} = 222\) grams

  • Population standard deviation: \(\sigma = 5\) grams (known)

  • Target weight: \(\mu_0 = 227\) grams

  • Significance level: \(\alpha = 0.05\)

Step 1: Construct the 95% Confidence Interval

Since \(\sigma\) is known and the data is normally distributed, we use:

\[\left(\bar{x} - z_{\alpha/2} \frac{\sigma}{\sqrt{n}}, \bar{x} + z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\right)\]

For 95% confidence, \(\alpha = 0.05\), so we need \(z_{0.025}\):

# Find critical value
z_critical <- qnorm(0.025, lower.tail = FALSE)
z_critical
# [1] 1.959964

Calculate the interval:

\[\left(222 - 1.96 \times \frac{5}{\sqrt{4}}, 222 + 1.96 \times \frac{5}{\sqrt{4}}\right)\]
\[\left(222 - 1.96 \times 2.5, 222 + 1.96 \times 2.5\right) = (217.1, 226.9)\]

Interpretation: We are 95% confident that the true mean weight of cherry tomato packages lies between 217.1 and 226.9 grams.

Step 2: Use Duality to Answer the Hypothesis Test

We want to test:

  • \(H_0: \mu = 227\) (machine is properly calibrated)

  • \(H_a: \mu \neq 227\) (machine needs revision)

Since our confidence level is 95% and our significance level is 5%, we have \(C + \alpha = 0.95 + 0.05 = 1.0\), so the duality relationship applies.

Checking the interval: Our 95% confidence interval is (217.1, 226.9). The null value \(\mu_0 = 227\) lies outside this interval (227 > 226.9).

Conclusion from duality: Since 227 is not in the confidence interval, we reject the null hypothesis.

Step 3: Verify with Formal Hypothesis Test

Let’s confirm this conclusion using the standard hypothesis testing procedure:

\[Z_{TS} = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}} = \frac{222 - 227}{5/\sqrt{4}} = \frac{-5}{2.5} = -2.0\]

For a two-sided test:

z_test_stat <- -2.0
p_value <- 2 * pnorm(abs(z_test_stat), lower.tail = FALSE)
p_value
# [1] 0.04550026

Decision: Since p-value = 0.0455 < \(\alpha = 0.05\), we reject the null hypothesis.

Consistency Check: Both approaches give the same conclusion! This confirms the duality relationship.

Practical Interpretation

The instructor notes that 227 is “just barely not in the interval,” which explains why the p-value (0.0455) is just slightly below 0.05. This suggests:

  • The evidence against proper calibration is statistically significant but not overwhelming

  • With only 4 packages, we should be cautious about strong conclusions

  • Additional data collection might provide more definitive evidence

10.3.5. From Z-Procedures to T-Procedures: When σ is Unknown

The Realistic Scenario

In the cherry tomato example, we assumed the population standard deviation was known (\(\sigma = 5\) grams). This convenient assumption allowed us to use z-procedures, but it’s rarely realistic. If we don’t know the population mean \(\mu\) (which is why we’re testing it), we almost certainly don’t know \(\sigma\) either.

The Impact of Unknown σ

When we replace the unknown \(\sigma\) with our sample estimate \(s\), we introduce additional uncertainty into our procedures. The sample standard deviation \(s\) is itself a random variable that varies from sample to sample, and this extra variability must be accounted for.

The T-Test Statistic

When \(\sigma\) is unknown, our test statistic becomes:

\[t_{TS} = \frac{\bar{X} - \mu_0}{s/\sqrt{n}}\]

Under the assumptions that:

  • The data comes from a normal distribution (or the sample size is large enough for CLT)

  • The observations are independent

  • The null hypothesis is true

This test statistic follows a t-distribution with \(df = n-1\) degrees of freedom.

Why the T-Distribution?

The t-distribution accounts for the additional uncertainty from estimating \(\sigma\):

  • Symmetric around zero (like the standard normal)

  • Heavier tails than the standard normal (reflecting extra uncertainty)

  • Approaches the standard normal as sample size increases

  • Degrees of freedom control the “heaviness” of the tails

The Convergence Property

As sample size increases:

  • \(s\) becomes a better estimate of \(\sigma\)

  • The t-distribution approaches the standard normal distribution

  • The difference between t-tests and z-tests becomes negligible

For large samples (\(n > 100\)), t-procedures and z-procedures give virtually identical results.

10.3.6. Duality Revisited: T-Procedures

The beautiful duality relationship we established for z-procedures carries over directly to t-procedures. The only change is that we use t-distributions instead of the standard normal distribution.

Two-Sided Tests and Confidence Intervals (σ Unknown)

For testing \(H_0: \mu = \mu_0\) versus \(H_a: \mu \neq \mu_0\):

Hypothesis Test:

  • Test statistic: \(t_{TS} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}\)

  • P-value: \(2P(T_{n-1} > |t_{TS}|)\)

  • R code: 2 * pt(abs(t_test_stat), df = n-1, lower.tail = FALSE)

Confidence Interval:

  • Formula: \(\bar{x} \pm t_{\alpha/2,n-1} \frac{s}{\sqrt{n}}\)

  • R code for critical value: qt(alpha/2, df = n-1, lower.tail = FALSE)

Duality: If \(\mu_0\) lies outside the \(100(1-\alpha)\%\) confidence interval, reject \(H_0\) at significance level \(\alpha\).

One-Sided Tests and Confidence Bounds (σ Unknown)

Right-tailed test (\(H_a: \mu > \mu_0\)) with lower confidence bound:

  • P-value: \(P(T_{n-1} > t_{TS})\)

  • Lower bound: \(\mu > \bar{x} - t_{\alpha,n-1} \frac{s}{\sqrt{n}}\)

  • Duality: If \(\mu_0\) lies below the lower bound, reject \(H_0\)

Left-tailed test (\(H_a: \mu < \mu_0\)) with upper confidence bound:

  • P-value: \(P(T_{n-1} < t_{TS})\)

  • Upper bound: \(\mu < \bar{x} + t_{\alpha,n-1} \frac{s}{\sqrt{n}}\)

  • Duality: If \(\mu_0\) lies above the upper bound, reject \(H_0\)

10.3.7. A Complete T-Test Example: Radon Detector Accuracy

Let’s work through a comprehensive example that demonstrates t-procedures and their duality relationships.

The Research Question

How accurate are radon detectors sold to homeowners? University researchers placed 12 detectors in a chamber exposed to exactly 105 picocuries per liter of radon. If the detectors work properly, they should read close to 105 on average.

Study Design

  • Sample: SRS of 12 radon detectors

  • True exposure: 105 picocuries per liter

  • Population standard deviation: Unknown (must be estimated)

  • Significance level: \(\alpha = 0.10\) (10%)

  • Assumption: Detector readings are normally distributed

The Data

Picocuries per liter readings: 91.9, 97.8, 111.4, 122.3, 105.4, 95.0, 103.8, 99.6, 119.3, 104.8, 101.7, 96.6

Step 1: State the Hypotheses

  • \(H_0: \mu = 105\) (detectors are accurate)

  • \(H_a: \mu \neq 105\) (detectors are not accurate)

Step 2: Calculate Sample Statistics

# The data
readings <- c(91.9, 97.8, 111.4, 122.3, 105.4, 95.0,
              103.8, 99.6, 119.3, 104.8, 101.7, 96.6)

# Calculate sample statistics
n <- length(readings)
x_bar <- mean(readings)
s <- sd(readings)
df <- n - 1

# Results
n        # 12
x_bar    # 104.1333
s        # 9.397421
df       # 11

Step 3: Calculate the Test Statistic

\[t_{TS} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}} = \frac{104.1333 - 105}{9.397421/\sqrt{12}} = \frac{-0.8667}{2.7136} = -0.319\]
mu_0 <- 105
t_test_stat <- (x_bar - mu_0) / (s / sqrt(n))
t_test_stat
# [1] -0.319

Step 4: Calculate the P-Value

For a two-sided test with \(df = 11\):

p_value <- 2 * pt(abs(t_test_stat), df = 11, lower.tail = FALSE)
p_value
# [1] 0.755

Step 5: Make the Decision

Since p-value = 0.755 > \(\alpha = 0.10\), we fail to reject the null hypothesis.

Step 6: Construct the Complementary Confidence Interval

For the duality relationship, we need a 90% confidence interval (since \(C + \alpha = 0.90 + 0.10 = 1.0\)):

# Calculate 90% confidence interval
alpha <- 0.10
t_critical <- qt(alpha/2, df = 11, lower.tail = FALSE)
t_critical
# [1] 1.795885

margin_error <- t_critical * (s / sqrt(n))
ci_lower <- x_bar - margin_error
ci_upper <- x_bar + margin_error

c(ci_lower, ci_upper)
# [1] 99.26145 109.00521

Step 7: Verify Duality

The 90% confidence interval is (99.3, 109.0). Since \(\mu_0 = 105\) lies within this interval, the duality principle tells us we should fail to reject \(H_0\)—which matches our hypothesis test conclusion.

Step 8: Interpretation

We do not have sufficient evidence to conclude that the radon detectors deviate from the true exposure level. The large p-value (0.755) indicates that the observed sample mean (104.1) is very consistent with the null hypothesis value (105.0). The data suggests the detectors are operating as intended by the manufacturer.

10.3.8. Why Such a Large P-Value?

The p-value of 0.755 is quite large, indicating strong consistency between our data and the null hypothesis. Several factors contribute:

  1. Small effect size: Sample mean (104.1) very close to null value (105.0)

  2. Small sample size: Only 12 observations limits precision

  3. Substantial variability: Sample standard deviation (9.4) is relatively large

  4. Two-sided test: We’re checking for deviations in either direction

This example illustrates that “failing to reject” doesn’t mean “accepting” the null hypothesis. It means we lack sufficient evidence to conclude the detectors are systematically inaccurate.

10.3.9. Comprehensive Summary: All T-Procedures

When \(\sigma\) is unknown, we use the test statistic \(t_{TS} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}\) with \(df = n-1\).

Two-Tailed Tests (:math:`H_a: mu neq mu_0`)

Hypothesis Test:

  • P-value: \(2 \times P(T_{n-1} > |t_{TS}|)\)

  • R code: 2 * pt(abs(t_test_stat), df = n-1, lower.tail = FALSE)

Confidence Interval:

  • Formula: \(\bar{x} \pm t_{\alpha/2,n-1} \frac{s}{\sqrt{n}}\)

  • R code: qt(alpha/2, df = n-1, lower.tail = FALSE)

  • Duality: If \(\mu_0\) outside interval → reject; if inside → fail to reject

Right-Tailed Tests (:math:`H_a: mu > mu_0`)

Hypothesis Test:

  • P-value: \(P(T_{n-1} > t_{TS})\)

  • R code: pt(t_test_stat, df = n-1, lower.tail = FALSE)

Lower Confidence Bound:

  • Formula: \(\mu > \bar{x} - t_{\alpha,n-1} \frac{s}{\sqrt{n}}\)

  • R code: qt(alpha, df = n-1, lower.tail = FALSE)

  • Duality: If \(\mu_0\) below bound → reject; if above → fail to reject

Left-Tailed Tests (:math:`H_a: mu < mu_0`)

Hypothesis Test:

  • P-value: \(P(T_{n-1} < t_{TS})\)

  • R code: pt(t_test_stat, df = n-1, lower.tail = TRUE)

Upper Confidence Bound:

  • Formula: \(\mu < \bar{x} + t_{\alpha,n-1} \frac{s}{\sqrt{n}}\)

  • R code: qt(alpha, df = n-1, lower.tail = FALSE)

  • Duality: If \(\mu_0\) above bound → reject; if below → fail to reject

10.3.10. Key Differences: T vs. Z Procedures

Critical Values

For any given significance level, t critical values are always larger than z critical values (except as \(n \to \infty\)). This means:

  • Confidence intervals using t are wider than those using z

  • Hypothesis tests using t require more extreme test statistics to reject \(H_0\)

  • P-values from t-tests are generally larger than corresponding z-tests

Why This Makes Sense

The larger critical values reflect our uncertainty about \(\sigma\). When we estimate \(\sigma\) with \(s\), we should be less confident in our conclusions, which the t-distribution captures through its heavier tails.

Sample Size Impact

  • Small samples (\(n < 15\)): Substantial difference between t and z procedures

  • Moderate samples (\(15 \leq n < 30\)): Noticeable but manageable differences

  • Large samples (\(n \geq 30\)): Differences become negligible

10.3.11. When Assumptions Are Violated

T-procedures assume data comes from a normal distribution. While t-tests are reasonably robust to moderate departures from normality, serious violations can be problematic, especially with small samples.

Alternative Approaches

  1. Data transformation (e.g., log transformation for right-skewed data)

  2. Non-parametric methods (e.g., Wilcoxon signed-rank test)

  3. Bootstrap methods for empirical sampling distributions

  4. Exact distributional methods when the true distribution is known

Conservative Guidelines

  • Always plot your data to check assumptions

  • Report any concerns about normality in small samples

  • Consider alternatives when assumptions are clearly violated

  • Remember: For large samples, t-procedures are quite robust due to the CLT

10.3.12. The Power of Duality

Understanding the duality between hypothesis tests and confidence intervals provides several advantages:

Computational Efficiency

Sometimes it’s easier to construct a confidence interval and check whether \(\mu_0\) falls inside than to calculate a p-value directly.

Deeper Understanding

Duality reinforces that both procedures quantify the same underlying uncertainty—they just present it differently.

Practical Insight

Confidence intervals show the magnitude of effects, while hypothesis tests provide yes/no answers. Together, they give a complete picture.

Consistency Check

When results from the two approaches don’t align, it signals an error in calculations or assumptions.

10.3.13. Bringing It All Together

Key Takeaways 📝

  1. Hypothesis tests and confidence intervals are dual procedures that address the same questions from different perspectives, connected by the relationship \(C + \alpha = 1\).

  2. Duality works for both z-procedures (σ known) and t-procedures (σ unknown), with the same logical framework applying to both.

  3. T-distributions have heavier tails than the standard normal, reflecting additional uncertainty when estimating σ with s.

  4. Direction of alternative hypothesis determines confidence bound type: \(H_a: \mu > \mu_0\) pairs with lower bounds, \(H_a: \mu < \mu_0\) pairs with upper bounds.

  5. T-procedures approach z-procedures as sample size increases, with negligible differences for large samples (\(n > 100\)).

  6. Failing to reject doesn’t mean accepting the null hypothesis—it means insufficient evidence against it.

  7. Practical interpretation requires considering both statistical significance and practical importance in context.

  8. Assumptions matter most for small samples—normality becomes less critical as sample size increases due to CLT.

Exercises

  1. Duality Verification: A researcher constructs a 95% confidence interval for μ and gets (12.3, 18.7). Without doing any calculations, determine the outcome of testing \(H_0: \mu = 15\) vs. \(H_a: \mu \neq 15\) at \(\alpha = 0.05\). Explain your reasoning.

  2. Coffee Shop Service: A coffee shop claims average service time is 3 minutes. You time 15 customers and find \(\bar{x} = 3.4\) minutes, \(s = 0.8\) minutes. a) Test the claim at \(\alpha = 0.05\) b) Construct a 95% confidence interval c) Verify that your results from (a) and (b) are consistent

  3. One-Sided Bounds: A manufacturer wants to show their batteries last more than 20 hours on average. With 12 batteries, they get \(\bar{x} = 22.1\) hours, \(s = 3.5\) hours. a) What type of confidence bound is appropriate? b) Test at \(\alpha = 0.05\) using the hypothesis test approach c) Verify using the appropriate 95% confidence bound

  4. Sample Size Impact: Explain why a t-test with \(n = 5\) requires a larger test statistic to reject \(H_0\) than a z-test with the same data and significance level. What does this say about our confidence in conclusions from small samples?

  5. Cherry Tomato Follow-up: In the cherry tomato example, suppose σ had been unknown and estimated as \(s = 5\) grams from the sample of 4 packages. Rework the entire analysis using t-procedures and compare your conclusions to the original z-procedure results.

  6. Critical Thinking: A study reports “no significant difference” with \(p = 0.12\) and \(n = 8\). The researcher concludes the null hypothesis is true. Identify at least three problems with this reasoning and suggest better ways to interpret the results.