10.3. Hypothesis Test for the Population Mean When σ Is Unknown
Hypothesis testing and confidence intervals are complementary tools that approach statistical inference from different angles. While hypothesis tests provide yes-or-no answers about specific parameter values, confidence intervals give us ranges of plausible values. Understanding their deep connection not only provides computational shortcuts but also reinforces the underlying logic of statistical inference.
Road Map 🧭
Problem we will solve – How hypothesis tests and confidence intervals/bounds are two sides of the same coin, and how to extend these relationships when σ is unknown
Tool we’ll learn – The duality principle connecting tests and intervals, then t-procedures when σ must be estimated
How it fits – This completes our understanding of inference for population means, showing how different approaches yield consistent conclusions
10.3.1. The Duality of Hypothesis Tests and Confidence Intervals
Two Perspectives on the Same Question
Confidence intervals and hypothesis tests address the same fundamental question from different angles:
Confidence intervals quantify uncertainty in our estimation of unknown population parameters, providing a region of plausible values for the truth
Hypothesis tests start from a specific assumption and assess whether our data provides sufficient evidence to reject that assumption
The key insight is that these approaches are mathematically equivalent under certain conditions. A confidence interval contains exactly those null hypothesis values that would not be rejected in a corresponding hypothesis test.
The Fundamental Duality Principle
For the duality to work, we need one crucial condition:
That is, \(C + \alpha = 1\), where \(C\) is the confidence coefficient and \(\alpha\) is the significance level.
When this condition holds: - If \(\mu_0\) lies inside the confidence interval → fail to reject \(H_0: \mu = \mu_0\) - If \(\mu_0\) lies outside the confidence interval → reject \(H_0: \mu = \mu_0\)
Why This Works
Both procedures use the same sampling distribution and the same critical values, just applied in different ways:
Confidence intervals ask: “What parameter values are consistent with this sample?”
Hypothesis tests ask: “Is this specific parameter value consistent with this sample?”
The mathematical machinery is identical—only the perspective changes.
10.3.2. Two-Sided Tests and Confidence Intervals, When σ Is Known
The Standard Case
For testing \(H_0: \mu = \mu_0\) versus \(H_a: \mu \neq \mu_0\) when \(\sigma\) is known, we use:
Hypothesis Test:
Test statistic: \(Z_{TS} = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\)
Decision rule: Reject \(H_0\) if \(|Z_{TS}| > z_{\alpha/2}\)
P-value: \(2P(Z > |Z_{TS}|)\)
Confidence Interval:
Formula: \(\left(\bar{x} - z_{\alpha/2} \frac{\sigma}{\sqrt{n}}, \bar{x} + z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\right)\)
Interpretation: We’re \(100(1-\alpha)\%\) confident the true mean lies in this interval
The Connection
Both procedures use the same critical value \(z_{\alpha/2}\) from the standard normal distribution. The test rejects when the observed sample mean is more than \(z_{\alpha/2}\) standard errors away from \(\mu_0\). The confidence interval includes all values that are within \(z_{\alpha/2}\) standard errors of the observed sample mean.
10.3.3. One-Sided Tests and Confidence Bounds, When σ Is Known
Right-Tailed Tests and Lower Bounds
For testing \(H_0: \mu \leq \mu_0\) versus \(H_a: \mu > \mu_0\):
Hypothesis Test:
Test statistic: \(Z_{TS} = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\)
Decision rule: Reject \(H_0\) if \(Z_{TS} > z_{\alpha}\)
P-value: \(P(Z > Z_{TS})\)
Lower Confidence Bound:
Formula: \(\mu > \bar{x} - z_{\alpha} \frac{\sigma}{\sqrt{n}}\)
Interpretation: We’re \(100(1-\alpha)\%\) confident the true mean exceeds this lower bound
The Logic: If we believe \(\mu > \mu_0\), then plausible values should extend upward from some lower threshold.
Left-Tailed Tests and Upper Bounds
For testing \(H_0: \mu \geq \mu_0\) versus \(H_a: \mu < \mu_0\):
Hypothesis Test:
Test statistic: \(Z_{TS} = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\)
Decision rule: Reject \(H_0\) if \(Z_{TS} < -z_{\alpha}\)
P-value: \(P(Z < Z_{TS})\)
Upper Confidence Bound:
Formula: \(\mu < \bar{x} + z_{\alpha} \frac{\sigma}{\sqrt{n}}\)
Interpretation: We’re \(100(1-\alpha)\%\) confident the true mean is below this upper bound
The Logic: If we believe \(\mu < \mu_0\), then plausible values should extend downward from some upper threshold.
Direction Principle
A helpful way to remember: the direction of the alternative hypothesis indicates the direction in which plausible values extend.
\(H_a: \mu > \mu_0\) → plausible values extend upward → need lower bound
\(H_a: \mu < \mu_0\) → plausible values extend downward → need upper bound
10.3.4. A Complete Example: Quality Control for Cherry Tomatoes
Let’s demonstrate the duality principle with a comprehensive example that shows both the confidence interval and hypothesis testing approaches.
The Scenario
Tom Green oversees quality control for a large produce company. He obtains a simple random sample of four packages of cherry tomatoes, each labeled 1/2 lb (227g). The average weight from Tom’s four packages is 222g. The packaging process has a known standard deviation of 5g, and package weights are normally distributed.
The Questions
Construct a 95% confidence interval for the mean weight
Test at \(\alpha = 0.05\) whether there’s evidence the machine needs revision (i.e., the mean differs from 227g)
Given Information
Sample size: \(n = 4\)
Sample mean: \(\bar{x} = 222\) grams
Population standard deviation: \(\sigma = 5\) grams (known)
Target weight: \(\mu_0 = 227\) grams
Significance level: \(\alpha = 0.05\)
Step 1: Construct the 95% Confidence Interval
Since \(\sigma\) is known and the data is normally distributed, we use:
For 95% confidence, \(\alpha = 0.05\), so we need \(z_{0.025}\):
# Find critical value
z_critical <- qnorm(0.025, lower.tail = FALSE)
z_critical
# [1] 1.959964
Calculate the interval:
Interpretation: We are 95% confident that the true mean weight of cherry tomato packages lies between 217.1 and 226.9 grams.
Step 2: Use Duality to Answer the Hypothesis Test
We want to test:
\(H_0: \mu = 227\) (machine is properly calibrated)
\(H_a: \mu \neq 227\) (machine needs revision)
Since our confidence level is 95% and our significance level is 5%, we have \(C + \alpha = 0.95 + 0.05 = 1.0\), so the duality relationship applies.
Checking the interval: Our 95% confidence interval is (217.1, 226.9). The null value \(\mu_0 = 227\) lies outside this interval (227 > 226.9).
Conclusion from duality: Since 227 is not in the confidence interval, we reject the null hypothesis.
Step 3: Verify with Formal Hypothesis Test
Let’s confirm this conclusion using the standard hypothesis testing procedure:
For a two-sided test:
z_test_stat <- -2.0
p_value <- 2 * pnorm(abs(z_test_stat), lower.tail = FALSE)
p_value
# [1] 0.04550026
Decision: Since p-value = 0.0455 < \(\alpha = 0.05\), we reject the null hypothesis.
Consistency Check: Both approaches give the same conclusion! This confirms the duality relationship.
Practical Interpretation
The instructor notes that 227 is “just barely not in the interval,” which explains why the p-value (0.0455) is just slightly below 0.05. This suggests:
The evidence against proper calibration is statistically significant but not overwhelming
With only 4 packages, we should be cautious about strong conclusions
Additional data collection might provide more definitive evidence
10.3.5. From Z-Procedures to T-Procedures: When σ is Unknown
The Realistic Scenario
In the cherry tomato example, we assumed the population standard deviation was known (\(\sigma = 5\) grams). This convenient assumption allowed us to use z-procedures, but it’s rarely realistic. If we don’t know the population mean \(\mu\) (which is why we’re testing it), we almost certainly don’t know \(\sigma\) either.
The Impact of Unknown σ
When we replace the unknown \(\sigma\) with our sample estimate \(s\), we introduce additional uncertainty into our procedures. The sample standard deviation \(s\) is itself a random variable that varies from sample to sample, and this extra variability must be accounted for.
The T-Test Statistic
When \(\sigma\) is unknown, our test statistic becomes:
Under the assumptions that:
The data comes from a normal distribution (or the sample size is large enough for CLT)
The observations are independent
The null hypothesis is true
This test statistic follows a t-distribution with \(df = n-1\) degrees of freedom.
Why the T-Distribution?
The t-distribution accounts for the additional uncertainty from estimating \(\sigma\):
Symmetric around zero (like the standard normal)
Heavier tails than the standard normal (reflecting extra uncertainty)
Approaches the standard normal as sample size increases
Degrees of freedom control the “heaviness” of the tails
The Convergence Property
As sample size increases:
\(s\) becomes a better estimate of \(\sigma\)
The t-distribution approaches the standard normal distribution
The difference between t-tests and z-tests becomes negligible
For large samples (\(n > 100\)), t-procedures and z-procedures give virtually identical results.
10.3.6. Duality Revisited: T-Procedures
The beautiful duality relationship we established for z-procedures carries over directly to t-procedures. The only change is that we use t-distributions instead of the standard normal distribution.
Two-Sided Tests and Confidence Intervals (σ Unknown)
For testing \(H_0: \mu = \mu_0\) versus \(H_a: \mu \neq \mu_0\):
Hypothesis Test:
Test statistic: \(t_{TS} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}\)
P-value: \(2P(T_{n-1} > |t_{TS}|)\)
R code: 2 * pt(abs(t_test_stat), df = n-1, lower.tail = FALSE)
Confidence Interval:
Formula: \(\bar{x} \pm t_{\alpha/2,n-1} \frac{s}{\sqrt{n}}\)
R code for critical value: qt(alpha/2, df = n-1, lower.tail = FALSE)
Duality: If \(\mu_0\) lies outside the \(100(1-\alpha)\%\) confidence interval, reject \(H_0\) at significance level \(\alpha\).
One-Sided Tests and Confidence Bounds (σ Unknown)
Right-tailed test (\(H_a: \mu > \mu_0\)) with lower confidence bound:
P-value: \(P(T_{n-1} > t_{TS})\)
Lower bound: \(\mu > \bar{x} - t_{\alpha,n-1} \frac{s}{\sqrt{n}}\)
Duality: If \(\mu_0\) lies below the lower bound, reject \(H_0\)
Left-tailed test (\(H_a: \mu < \mu_0\)) with upper confidence bound:
P-value: \(P(T_{n-1} < t_{TS})\)
Upper bound: \(\mu < \bar{x} + t_{\alpha,n-1} \frac{s}{\sqrt{n}}\)
Duality: If \(\mu_0\) lies above the upper bound, reject \(H_0\)
10.3.7. A Complete T-Test Example: Radon Detector Accuracy
Let’s work through a comprehensive example that demonstrates t-procedures and their duality relationships.
The Research Question
How accurate are radon detectors sold to homeowners? University researchers placed 12 detectors in a chamber exposed to exactly 105 picocuries per liter of radon. If the detectors work properly, they should read close to 105 on average.
Study Design
Sample: SRS of 12 radon detectors
True exposure: 105 picocuries per liter
Population standard deviation: Unknown (must be estimated)
Significance level: \(\alpha = 0.10\) (10%)
Assumption: Detector readings are normally distributed
The Data
Picocuries per liter readings: 91.9, 97.8, 111.4, 122.3, 105.4, 95.0, 103.8, 99.6, 119.3, 104.8, 101.7, 96.6
Step 1: State the Hypotheses
\(H_0: \mu = 105\) (detectors are accurate)
\(H_a: \mu \neq 105\) (detectors are not accurate)
Step 2: Calculate Sample Statistics
# The data
readings <- c(91.9, 97.8, 111.4, 122.3, 105.4, 95.0,
103.8, 99.6, 119.3, 104.8, 101.7, 96.6)
# Calculate sample statistics
n <- length(readings)
x_bar <- mean(readings)
s <- sd(readings)
df <- n - 1
# Results
n # 12
x_bar # 104.1333
s # 9.397421
df # 11
Step 3: Calculate the Test Statistic
mu_0 <- 105
t_test_stat <- (x_bar - mu_0) / (s / sqrt(n))
t_test_stat
# [1] -0.319
Step 4: Calculate the P-Value
For a two-sided test with \(df = 11\):
p_value <- 2 * pt(abs(t_test_stat), df = 11, lower.tail = FALSE)
p_value
# [1] 0.755
Step 5: Make the Decision
Since p-value = 0.755 > \(\alpha = 0.10\), we fail to reject the null hypothesis.
Step 6: Construct the Complementary Confidence Interval
For the duality relationship, we need a 90% confidence interval (since \(C + \alpha = 0.90 + 0.10 = 1.0\)):
# Calculate 90% confidence interval
alpha <- 0.10
t_critical <- qt(alpha/2, df = 11, lower.tail = FALSE)
t_critical
# [1] 1.795885
margin_error <- t_critical * (s / sqrt(n))
ci_lower <- x_bar - margin_error
ci_upper <- x_bar + margin_error
c(ci_lower, ci_upper)
# [1] 99.26145 109.00521
Step 7: Verify Duality
The 90% confidence interval is (99.3, 109.0). Since \(\mu_0 = 105\) lies within this interval, the duality principle tells us we should fail to reject \(H_0\)—which matches our hypothesis test conclusion.
Step 8: Interpretation
We do not have sufficient evidence to conclude that the radon detectors deviate from the true exposure level. The large p-value (0.755) indicates that the observed sample mean (104.1) is very consistent with the null hypothesis value (105.0). The data suggests the detectors are operating as intended by the manufacturer.
10.3.8. Why Such a Large P-Value?
The p-value of 0.755 is quite large, indicating strong consistency between our data and the null hypothesis. Several factors contribute:
Small effect size: Sample mean (104.1) very close to null value (105.0)
Small sample size: Only 12 observations limits precision
Substantial variability: Sample standard deviation (9.4) is relatively large
Two-sided test: We’re checking for deviations in either direction
This example illustrates that “failing to reject” doesn’t mean “accepting” the null hypothesis. It means we lack sufficient evidence to conclude the detectors are systematically inaccurate.
10.3.9. Comprehensive Summary: All T-Procedures
When \(\sigma\) is unknown, we use the test statistic \(t_{TS} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}\) with \(df = n-1\).
Two-Tailed Tests (:math:`H_a: mu neq mu_0`)
Hypothesis Test:
P-value: \(2 \times P(T_{n-1} > |t_{TS}|)\)
R code: 2 * pt(abs(t_test_stat), df = n-1, lower.tail = FALSE)
Confidence Interval:
Formula: \(\bar{x} \pm t_{\alpha/2,n-1} \frac{s}{\sqrt{n}}\)
R code: qt(alpha/2, df = n-1, lower.tail = FALSE)
Duality: If \(\mu_0\) outside interval → reject; if inside → fail to reject
Right-Tailed Tests (:math:`H_a: mu > mu_0`)
Hypothesis Test:
P-value: \(P(T_{n-1} > t_{TS})\)
R code: pt(t_test_stat, df = n-1, lower.tail = FALSE)
Lower Confidence Bound:
Formula: \(\mu > \bar{x} - t_{\alpha,n-1} \frac{s}{\sqrt{n}}\)
R code: qt(alpha, df = n-1, lower.tail = FALSE)
Duality: If \(\mu_0\) below bound → reject; if above → fail to reject
Left-Tailed Tests (:math:`H_a: mu < mu_0`)
Hypothesis Test:
P-value: \(P(T_{n-1} < t_{TS})\)
R code: pt(t_test_stat, df = n-1, lower.tail = TRUE)
Upper Confidence Bound:
Formula: \(\mu < \bar{x} + t_{\alpha,n-1} \frac{s}{\sqrt{n}}\)
R code: qt(alpha, df = n-1, lower.tail = FALSE)
Duality: If \(\mu_0\) above bound → reject; if below → fail to reject
10.3.10. Key Differences: T vs. Z Procedures
Critical Values
For any given significance level, t critical values are always larger than z critical values (except as \(n \to \infty\)). This means:
Confidence intervals using t are wider than those using z
Hypothesis tests using t require more extreme test statistics to reject \(H_0\)
P-values from t-tests are generally larger than corresponding z-tests
Why This Makes Sense
The larger critical values reflect our uncertainty about \(\sigma\). When we estimate \(\sigma\) with \(s\), we should be less confident in our conclusions, which the t-distribution captures through its heavier tails.
Sample Size Impact
Small samples (\(n < 15\)): Substantial difference between t and z procedures
Moderate samples (\(15 \leq n < 30\)): Noticeable but manageable differences
Large samples (\(n \geq 30\)): Differences become negligible
10.3.11. When Assumptions Are Violated
T-procedures assume data comes from a normal distribution. While t-tests are reasonably robust to moderate departures from normality, serious violations can be problematic, especially with small samples.
Alternative Approaches
Data transformation (e.g., log transformation for right-skewed data)
Non-parametric methods (e.g., Wilcoxon signed-rank test)
Bootstrap methods for empirical sampling distributions
Exact distributional methods when the true distribution is known
Conservative Guidelines
Always plot your data to check assumptions
Report any concerns about normality in small samples
Consider alternatives when assumptions are clearly violated
Remember: For large samples, t-procedures are quite robust due to the CLT
10.3.12. The Power of Duality
Understanding the duality between hypothesis tests and confidence intervals provides several advantages:
Computational Efficiency
Sometimes it’s easier to construct a confidence interval and check whether \(\mu_0\) falls inside than to calculate a p-value directly.
Deeper Understanding
Duality reinforces that both procedures quantify the same underlying uncertainty—they just present it differently.
Practical Insight
Confidence intervals show the magnitude of effects, while hypothesis tests provide yes/no answers. Together, they give a complete picture.
Consistency Check
When results from the two approaches don’t align, it signals an error in calculations or assumptions.
10.3.13. Bringing It All Together
Key Takeaways 📝
Hypothesis tests and confidence intervals are dual procedures that address the same questions from different perspectives, connected by the relationship \(C + \alpha = 1\).
Duality works for both z-procedures (σ known) and t-procedures (σ unknown), with the same logical framework applying to both.
T-distributions have heavier tails than the standard normal, reflecting additional uncertainty when estimating σ with s.
Direction of alternative hypothesis determines confidence bound type: \(H_a: \mu > \mu_0\) pairs with lower bounds, \(H_a: \mu < \mu_0\) pairs with upper bounds.
T-procedures approach z-procedures as sample size increases, with negligible differences for large samples (\(n > 100\)).
Failing to reject doesn’t mean accepting the null hypothesis—it means insufficient evidence against it.
Practical interpretation requires considering both statistical significance and practical importance in context.
Assumptions matter most for small samples—normality becomes less critical as sample size increases due to CLT.
Exercises
Duality Verification: A researcher constructs a 95% confidence interval for μ and gets (12.3, 18.7). Without doing any calculations, determine the outcome of testing \(H_0: \mu = 15\) vs. \(H_a: \mu \neq 15\) at \(\alpha = 0.05\). Explain your reasoning.
Coffee Shop Service: A coffee shop claims average service time is 3 minutes. You time 15 customers and find \(\bar{x} = 3.4\) minutes, \(s = 0.8\) minutes. a) Test the claim at \(\alpha = 0.05\) b) Construct a 95% confidence interval c) Verify that your results from (a) and (b) are consistent
One-Sided Bounds: A manufacturer wants to show their batteries last more than 20 hours on average. With 12 batteries, they get \(\bar{x} = 22.1\) hours, \(s = 3.5\) hours. a) What type of confidence bound is appropriate? b) Test at \(\alpha = 0.05\) using the hypothesis test approach c) Verify using the appropriate 95% confidence bound
Sample Size Impact: Explain why a t-test with \(n = 5\) requires a larger test statistic to reject \(H_0\) than a z-test with the same data and significance level. What does this say about our confidence in conclusions from small samples?
Cherry Tomato Follow-up: In the cherry tomato example, suppose σ had been unknown and estimated as \(s = 5\) grams from the sample of 4 packages. Rework the entire analysis using t-procedures and compare your conclusions to the original z-procedure results.
Critical Thinking: A study reports “no significant difference” with \(p = 0.12\) and \(n = 8\). The researcher concludes the null hypothesis is true. Identify at least three problems with this reasoning and suggest better ways to interpret the results.