11.2. Independent Two-Sample Analysis When Population Variances Are Known

We now develop the mathematical foundation for comparing the means of two independent populations, assuming that the population variances are known. The simplifying assumption will be lifted in later lessons when we discuss more realistic scenarios.

Road Map 🧭

  • List the three key assumptions required for the construction of independent two-sample inference methods.

  • Construct hypothesis tests and confidence regions for the difference in the means of two independent populations. Identify the common underlying principles with one-sample inference.

11.2.1. The Assumptions

The validity of independent two-sample procedures rests on three fundamental assumptions that extend the single-sample framework to comparisons. These assumptions must be carefully verified before applying the methods.

Assumption 1: SRS from Each Population

The random variables \(X_{A1}, X_{A2}, \ldots, X_{An_A}\) form an independent and identically distributed (iid) sample from Population A. Similarly, the random variables \(X_{B1}, X_{B2}, \ldots, X_{Bn_B}\) constitute an iid sample from Population B.

Assumption 2: Independence Between Populations

The observations from one population are independent of those from the other population. Formally, \(X_{Ai}\) is independent of \(X_{Bj}\) for all possible pairs of indices \(i \in \{1, 2, \ldots, n_A\}\) and \(j \in \{1, 2, \ldots, n_B\}\).

Assumption 3: Normality of Sampling Distributions

For each population, either the population distribution is normal, or the sample size is large enough for the CLT to hold.

11.2.2. The Parameter of Interest and Its Point Estimator

The Target Parameter

Recall that our primary interest lies not in the individual population means \(\mu_A\) and \(\mu_B\), but rather in their difference. Our parameter of interest is:

\[\theta = \mu_A - \mu_B\]

We conceptualize this difference as a single parameter that captures the essence of the comparison we wish to make. Its sign and magnitude indicate the direction and size of any systematic difference between the populations.

The Point Estimator

Since we know that \(\bar{X}_A\) and \(\bar{X}_B\) are unbiased estimators of their respective population means \(\mu_A\) and \(\mu_B\), the natural point estimator for \(\theta\) is:

\[\hat{\theta} = \bar{X}_A - \bar{X}_B.\]

11.2.3. Theoretical Properties of the Point Estimator

As a result of Assumptions 1 and 3, the sampling distributions of the two sample means are:

\[\begin{split}&\bar{X}_A \sim N\left(\mu_A, \frac{\sigma_A}{\sqrt{n_A}}\right)\\ &\bar{X}_B \sim N\left(\mu_B, \frac{\sigma_B}{\sqrt{n_B}}\right)\end{split}\]

Furthermore, the two random variables are independent since their building blocks are independent according to Assumption 2. We establish the theoretical properties of the difference estimator \(\bar{X}_A - \bar{X}_B\) starting from this baseline.

Unbiasedness

The difference in sample means is an unbiased estimator of the difference in population means. To establish this formally:

\[E[\bar{X}_A - \bar{X}_B] = E[\bar{X}_A] - E[\bar{X}_B] = \mu_A - \mu_B.\]

Therefore, the bias of the estimator is:

\[\text{Bias}[\bar{X}_A - \bar{X}_B] = E[\bar{X}_A - \bar{X}_B] - (\mu_A - \mu_B) = 0.\]

Variance of the Estimator

The variance of \(\bar{X}_A - \bar{X}_B\) depends critically on the independence assumption between populations. Recall that for two independent random variables, the variance of their difference equals the sum of their individual variances:

\[\text{Var}[\bar{X}_A - \bar{X}_B] = \text{Var}[\bar{X}_A] + \text{Var}[\bar{X}_B] = \frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}.\]

Standard Error

The standard deviation of the estimator, or the standard error, is obtained by taking the square root of the variance:

\[\sigma_{\bar{X}_A - \bar{X}_B} = \sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}}.\]

The Sampling Distribution of the Difference Estimator

We now know the expected value and variance of the difference estimator \(\bar{X}_A - \bar{X}_B\). Additionally, since \(\bar{X}_A\) and \(\bar{X}_B\) are each normally distributed, their difference is also normally distributed. Combining these results, we establish the full sampling distribution of the difference estimator as:

\[\bar{X}_A - \bar{X}_B \sim N\left(\mu_A - \mu_B, \sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}}\right).\]

Equivalently, its standardization follows the standard normal distribution:

\[\frac{(\bar{X}_A - \bar{X}_B)-(\mu_A - \mu_B)}{\sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}}} \sim N(0,1).\]

Based on these key results, we will now build inference methods for the difference in population means.

11.2.4. Hypothesis Testing for the Difference in Means

The four-step hypothesis testing framework extends naturally to the two-sample setting, with modifications to accommodate the comparative nature.

Step 1: Parameter Identification

We must clearly identify both population means using contextually meaningful labels. Rather than generic labels like A and B, we encourage the use of descriptive terms that reflect the populations being studied.

For example, suppose we are comparing the systolic blood pressure of two patient groups after assigning one group a placebo and the other a newly developed treatment. We can define the relevant populations and their true means in the following manner:

  • Let \(\mu_{\text{treatment}}\) denote the true mean systolic blood pressure of patients who are treated with the new procedure.

  • Let \(\mu_{\text{control}}\) denote the true mean systolic blood pressure of patients who are not treated with the new medical procedure.

The parameter identification should also specify the units of measurement and provide sufficient context for interpreting the parameter and the target population within the scope of the research question.

Step 2: Hypothesis Formulation

Follow the template:

Template for independent two-sample hypotheses

Fig. 11.2 Template for independent two-sample hypotheses

You may also refer to the details provided in Chapter 11.1.3.

Step 3: Test Statistic and \(p\)-Value

Recall the \(z\)-test statistic used in one-sample hypothesis testing:

\[\text{One-sample } Z_{TS} = \frac{\bar{X}-\mu_0}{\sigma/\sqrt{n}} = \frac{\text{estimator}-\text{null value}}{\text{std. error}}.\]

By providing a standardized distance between the estimator and the null value, the \(z\)-test statistic measured how far the sample data is from the null assumption.

We define a new \(z\)-test statistic to serve the same purpose by replacing each one-sample component with the appropriate independent two-sample parallel:

\[Z_{TS} = \frac{(\bar{X}_A - \bar{X}_B) - \Delta_0}{\sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}}}.\]

As in the one-sample case, under the the null hypothesis,

\[Z_{TS} \sim N(0,1).\]

The \(p\)-value calculation therefore follows the same principles as in single-sample \(z\)-tests:

Revisiting \(p\)-Value Computation

Upper-tailed

\(P(Z > z_{TS})\)

Lower-tailed

\(P(Z < z_{TS})\)

Two-tailed

\(2P(Z > |z_{TS}|) = 2P(Z < -|z_{TS}|)\)

Step 4: Decision and Conclusion

The decision rule remains unchanged from single-sample procedures:

  • If \(p\)-value \(\leq \alpha\), we reject \(H_0\).

  • If \(p\)-value \(> \alpha\), we fail to reject \(H_0\).

The conclusion template below is adapted to address the comparative nature of two-sample procedures:

“The data [does/does not] give [some/strong] support (p-value = [value]) to the claim that [statement of \(H_a\) in context about the difference in population means].”

The strength descriptors should reflect the magnitude of the \(p\)-value relative to conventional benchmarks and the significance level used in the study.

Example 💡: Shift Scheduling and Work Efficiency

A retail chain tests two different workforce scheduling systems to see which helps cashiers process more transactions per 8-hour shift. They run independent pilots on different stores:

  • System A: \(n_A = 25\), \(\bar{x}_A = 50\), \(\sigma_A = 10\) (known)

  • System B: \(n_B = 30\), \(\bar{x}_B = 45\), \(\sigma_B = 12\) (known)

Perform a hypothesis test to determine whether the true mean numbers of transactions are different at the \(\alpha = 0.05\) significance level.

Step 1: Define the parameters and target populations

Let \(\mu_A\) denote the true mean number of transactions processed by cashiers following System A. Likewise, let \(\mu_B\) be the true mean number of transactions completed by employees following System B.

Step 2: Write the hypotheses

\[\begin{split}&H_0: \mu_A - \mu_B = 0\\ &H_a: \mu_A - \mu_B \neq 0\end{split}\]

Step 3: Compute the test statistic and p-value

\[Z_{TS} =\frac{(\bar{x}_A - \bar{x}_B)-0}{\sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}}} = \frac{(50-45) - 0}{\sqrt{\frac{10^2}{25} + \frac{12^2}{30}}} \approx 1.68\]

This is a two-sided test, so the \(p\)-value is \(2P(Z > 1.68) \approx 0.093\).

Step 4: Decision and Conclusion

Since \(p\)-value \(= 0.093 > 0.05\), we fail to reject the null hypothesis at the 5% significance level. We do not have enough evidence to support the claim that the mean number of transactions fulfilled by cashiers are different by the scheduling system.

11.2.5. Confidence Regions for the Difference in Means

Let us begin by constructing a \(100C\%\) confidence interval. The goal is to find the margin of error (ME) such that

\[P((\bar{X}_A - \bar{X}_B) - ME < \mu_A - \mu_B < (\bar{X}_A - \bar{X}_B) + ME) = C.\]

The Pivotal Method

We begin with the known truth:

\[P(-z_{\alpha/2} < Z < z_{\alpha/2}) = C.\]

Replace \(Z\) with the standardization of the difference estimator, or the pivotal quantity:

\[P\left(-z_{\alpha/2} < \frac{(\bar{X}_A - \bar{X}_B) - (\mu_A - \mu_B)}{\sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}}} < z_{\alpha/2}\right) = C.\]

Through algebraic manipulation to isolate \(\mu_A - \mu_B\) in the center of the inequality, we obtain

\[P((\bar{X}_A - \bar{X}_B) - ME < \mu_A - \mu_B < (\bar{X}_A - \bar{X}_B) + ME) = C,\]

with \(ME = z_{\alpha/2} \sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}}\).

For a single experiment, therefore, the \(100C \%\) confidence interval is:

\[(\bar{x}_A - \bar{x}_B) \pm z_{\alpha/2} \sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}}\]

Complete Summary of Confidence Intervals and Bounds

The one-sided confidence bounds can be derived similarly. We leave the details as an excercise—use the confidence interval derivation above and Chapter 9.4 as reference.

\(100\cdot (1-\alpha) \%\) Confidence Regions for Difference in Means

Confidence Interval

\[(\bar{x}_A - \bar{x}_B) \pm z_{\alpha/2} \sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}}\]

Upper Confidence Bound

\[(\bar{x}_A - \bar{x}_B) + z_{\alpha} \sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}}\]

Lower Confidence Bound

\[(\bar{x}_A - \bar{x}_B) - z_{\alpha} \sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}}\]

Note the repeated core elements. In both one-sample and two-sample cases, a confidence region is centered at the point estimate and expands in the appropiate directions by ME, computed as a product of a critical value and the standard error.

Interpreting Confidence Regions for Difference in Means

The confidence regions provide a range of plausible values for the true difference in population means; each region captures the true difference \(\mu_A - \mu_B\) with \(100(1-\alpha)\%\) confidence. Their precision depends on the confidence level, the population variances, and the sample sizes.

Example 💡: Shift Scheduling and Work Efficiency, Continued

For the experiment on two workforce scheduling systems with:

  • System A: \(n_A = 25\), \(\bar{x}_A = 50\), \(\sigma_A = 10\) (known)

  • System B: \(n_B = 30\), \(\bar{x}_B = 45\), \(\sigma_B = 12\) (known)

Compute the \(95 \%\) confidence interval for the difference of the two population means. Check if the result is consistent with the hypothesis test performed in the previous example.

Identify the components

  • The observed sample difference

    \[\bar{x}_A - \bar{x}_B = 50 - 45 = 5\]
  • The standard error

    \[\sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}} = \sqrt{\frac{10^2}{25} + \frac{12^2}{30}} \approx 2.97\]
  • The \(z\)-critical value

    qnorm(0.025, lower.tail=FALSE)
    # returns 1.96
    

Put the parts together

The confidence interval is:

\[5 \pm 1.96(2.97) = 5 \pm 5.82 = (-0.82, 10.82)\]

Is it consistent with the hypothesis test?

The interval contains zero, which is consistent with our failure to reject the null hypothesis of equal means.

11.2.6. Bringing It All Together

Key Takeaways 📝

  1. Two-sample independent procedures are designed to provide statistical answers to comparative questions. They require the key assumptions that (1) each sample is an SRS of the respective population, (2) the two samples are independent from each other, (3) and the CLT holds in each sample.

  2. The sampling distribution of the point estimator \(\bar{X}_A - \bar{X}_B\) is normal with mean \(\mu_A - \mu_B\) and variance \(\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}\). The addition of variances follows from the independence assumption between groups.

  3. The construction of hypothesis tests and confidence regions follows the same core principles as in the one-sample case.

11.2.7. Exercises

Exercise 1: Assumptions for Independent Two-Sample z-Procedures

For each scenario, evaluate whether the assumptions for the independent two-sample z-procedure are satisfied. If not, explain which assumption is violated.

  1. Comparing mean reaction times between gamers (n=50) and non-gamers (n=45). Both groups are randomly sampled, and historical studies suggest σ₁ = 25 ms and σ₂ = 30 ms.

  2. Comparing mean heights between students who eat breakfast (n=20) and those who skip breakfast (n=18) in a single classroom. The population standard deviations are known from large national studies.

  3. Comparing mean battery life between two phone models by testing 10 phones of each model. The manufacturer provides population standard deviations from extensive quality control testing.

  4. Comparing mean customer wait times at two bank branches by sampling 100 customers at each location. Prior studies established σ₁ = 5 min and σ₂ = 7 min.

Solution

Assumptions Required:

  1. SRS from each population (iid within each sample)

  2. Independence between populations

  3. Normality of sampling distributions (either normal populations or large n for CLT)

  4. Population standard deviations σ₁ and σ₂ are known

Part (a): Gamers vs. non-gamers — SATISFIED ✓

  • Random sampling: ✓

  • Independence between groups: ✓ (separate populations)

  • Normality/CLT: ✓ (n=50 and n=45 are large enough)

  • σ known: ✓ (from historical studies)

Part (b): Breakfast habits — VIOLATED ✗

  • Primary violation: This is a convenience sample from a single classroom, not a random sample from the population of interest. The results may not generalize beyond this specific classroom.

  • Secondary concern: The “σ known from national studies” may not be appropriate—the national population variance may not match this specific classroom population.

  • Note: While students in the same classroom might also lack independence, the lack of random sampling is the more fundamental issue.

Part (c): Phone battery life — POTENTIALLY PROBLEMATIC

  • Random sampling: Depends on how phones were selected

  • Independence: ✓

  • Normality/CLT: With n=10 each, CLT may not fully apply. Need to assume populations are approximately normal.

  • σ known: ✓

Part (d): Bank wait times — SATISFIED ✓

  • Random sampling: Assuming customers were randomly selected

  • Independence: ✓ (different branches)

  • Normality/CLT: ✓ (n=100 each is large)

  • σ known: ✓


Exercise 2: Calculating the Standard Error

For each scenario with known population standard deviations, calculate the standard error of \(\bar{X}_A - \bar{X}_B\).

  1. \(n_A = 25\), \(\sigma_A = 10\), \(n_B = 36\), \(\sigma_B = 12\)

  2. \(n_A = 100\), \(\sigma_A = 15\), \(n_B = 100\), \(\sigma_B = 15\)

  3. \(n_A = 16\), \(\sigma_A = 8\), \(n_B = 25\), \(\sigma_B = 10\)

  4. How does the standard error change if you double both sample sizes in part (a)?

Solution

Formula:

\[SE = \sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}}\]

Part (a):

\[SE = \sqrt{\frac{10^2}{25} + \frac{12^2}{36}} = \sqrt{\frac{100}{25} + \frac{144}{36}} = \sqrt{4 + 4} = \sqrt{8} = 2.828\]

Part (b):

\[SE = \sqrt{\frac{15^2}{100} + \frac{15^2}{100}} = \sqrt{\frac{225}{100} + \frac{225}{100}} = \sqrt{4.5} = 2.121\]

Part (c):

\[SE = \sqrt{\frac{8^2}{16} + \frac{10^2}{25}} = \sqrt{\frac{64}{16} + \frac{100}{25}} = \sqrt{4 + 4} = \sqrt{8} = 2.828\]

Part (d): Doubling sample sizes

With \(n_A = 50\) and \(n_B = 72\):

\[SE_{new} = \sqrt{\frac{100}{50} + \frac{144}{72}} = \sqrt{2 + 2} = \sqrt{4} = 2.0\]

The standard error decreases by a factor of \(\sqrt{2} \approx 1.414\):

\[\frac{SE_{old}}{SE_{new}} = \frac{2.828}{2.0} = 1.414 = \sqrt{2}\]

R verification:

# Part (a)
sqrt(10^2/25 + 12^2/36)  # 2.828

# Part (b)
sqrt(15^2/100 + 15^2/100)  # 2.121

# Part (c)
sqrt(8^2/16 + 10^2/25)  # 2.828

Exercise 3: Computing the Z Test Statistic

A software company compares the mean processing times of two algorithms. Based on extensive benchmarking, the population standard deviations are known: \(\sigma_A = 50\) ms and \(\sigma_B = 60\) ms.

Sample results:

  • Algorithm A: \(n_A = 40\), \(\bar{x}_A = 320\) ms

  • Algorithm B: \(n_B = 50\), \(\bar{x}_B = 350\) ms

  1. Calculate the point estimate for \(\mu_A - \mu_B\).

  2. Calculate the standard error.

  3. Calculate the z-test statistic for testing \(H_0: \mu_A - \mu_B = 0\).

  4. Interpret the test statistic value.

Solution

Part (a): Point estimate

\[\bar{x}_A - \bar{x}_B = 320 - 350 = -30 \text{ ms}\]

Part (b): Standard error

\[SE = \sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}} = \sqrt{\frac{50^2}{40} + \frac{60^2}{50}} = \sqrt{\frac{2500}{40} + \frac{3600}{50}} = \sqrt{62.5 + 72} = \sqrt{134.5} = 11.60\]

Part (c): Z-test statistic

\[z_{TS} = \frac{(\bar{x}_A - \bar{x}_B) - \Delta_0}{SE} = \frac{-30 - 0}{11.60} = \frac{-30}{11.60} = -2.586\]

Part (d): Interpretation

The observed difference of -30 ms is 2.586 standard errors below zero, suggesting Algorithm A is faster than Algorithm B. Statistical significance depends on the chosen α; for example, at α = 0.05 (two-sided), the critical value is ±1.96, so |z| = 2.586 > 1.96 would lead to rejecting H₀.

R verification:

xbar_A <- 320; xbar_B <- 350
sigma_A <- 50; sigma_B <- 60
n_A <- 40; n_B <- 50

point_est <- xbar_A - xbar_B  # -30
SE <- sqrt(sigma_A^2/n_A + sigma_B^2/n_B)  # 11.60
z_ts <- (point_est - 0) / SE  # -2.586

Exercise 4: Two-Tailed Hypothesis Test

A materials scientist compares the mean tensile strength of steel from two suppliers. Population standard deviations are known from historical quality data: \(\sigma_A = 15\) MPa and \(\sigma_B = 18\) MPa.

Sample results:

  • Supplier A: \(n_A = 36\), \(\bar{x}_A = 420\) MPa

  • Supplier B: \(n_B = 49\), \(\bar{x}_B = 412\) MPa

Test whether the mean tensile strengths differ at α = 0.05.

Solution

Step 1: Define the parameters

Let \(\mu_A\) = true mean tensile strength (MPa) of steel from Supplier A. Let \(\mu_B\) = true mean tensile strength (MPa) of steel from Supplier B.

Step 2: State the hypotheses

Testing for any difference:

\[\begin{split}&H_0: \mu_A - \mu_B = 0\\ &H_a: \mu_A - \mu_B \neq 0\end{split}\]

Step 3: Calculate the test statistic and p-value

Standard error:

\[SE = \sqrt{\frac{15^2}{36} + \frac{18^2}{49}} = \sqrt{\frac{225}{36} + \frac{324}{49}} = \sqrt{6.25 + 6.61} = \sqrt{12.86} = 3.586\]

Test statistic:

\[z_{TS} = \frac{(420 - 412) - 0}{3.586} = \frac{8}{3.586} = 2.231\]

P-value (two-tailed):

\[p\text{-value} = 2 \times P(Z > |2.231|) = 2 \times P(Z > 2.231) = 2 \times 0.0128 = 0.0257\]

Step 4: Decision and Conclusion

Since p-value = 0.0257 < α = 0.05, reject H₀.

The data does give support (p-value = 0.026) to the claim that the mean tensile strengths of steel from the two suppliers are different. Supplier A appears to provide stronger steel on average (420 vs. 412 MPa).

R verification:

xbar_A <- 420; xbar_B <- 412
sigma_A <- 15; sigma_B <- 18
n_A <- 36; n_B <- 49

SE <- sqrt(sigma_A^2/n_A + sigma_B^2/n_B)  # 3.586
z_ts <- (xbar_A - xbar_B) / SE  # 2.231
p_value <- 2 * pnorm(abs(z_ts), lower.tail = FALSE)  # 0.0257

Exercise 5: Lower-Tailed Hypothesis Test

A pharmaceutical company tests whether their new formulation provides faster pain relief than the current formulation. Historical data give \(\sigma_{new} = 8\) minutes and \(\sigma_{current} = 10\) minutes.

Sample results:

  • New formulation: \(n_{new} = 45\), \(\bar{x}_{new} = 22\) minutes

  • Current formulation: \(n_{current} = 50\), \(\bar{x}_{current} = 26\) minutes

Test whether the new formulation provides faster relief (lower time) at α = 0.01.

Solution

Step 1: Define the parameters

Let \(\mu_{new}\) = true mean time to pain relief (minutes) for the new formulation. Let \(\mu_{current}\) = true mean time to pain relief (minutes) for the current formulation.

Step 2: State the hypotheses

“Faster relief” means lower time for new formulation:

\[\begin{split}&H_0: \mu_{new} - \mu_{current} \geq 0\\ &H_a: \mu_{new} - \mu_{current} < 0\end{split}\]

This is a lower-tailed test.

Step 3: Calculate the test statistic and p-value

Standard error:

\[SE = \sqrt{\frac{8^2}{45} + \frac{10^2}{50}} = \sqrt{\frac{64}{45} + \frac{100}{50}} = \sqrt{1.422 + 2.0} = \sqrt{3.422} = 1.850\]

Test statistic:

\[z_{TS} = \frac{(22 - 26) - 0}{1.850} = \frac{-4}{1.850} = -2.162\]

P-value (lower-tailed):

\[p\text{-value} = P(Z < -2.162) = 0.0153\]

Step 4: Decision and Conclusion

Since p-value = 0.0153 > α = 0.01, fail to reject H₀.

The data does not give support (p-value = 0.015) to the claim that the new formulation provides faster pain relief than the current formulation at the 1% significance level. While the sample shows a 4-minute improvement, this is not statistically significant at α = 0.01.

Note: At α = 0.05, we would reject H₀.

R verification:

xbar_new <- 22; xbar_current <- 26
sigma_new <- 8; sigma_current <- 10
n_new <- 45; n_current <- 50

SE <- sqrt(sigma_new^2/n_new + sigma_current^2/n_current)  # 1.850
z_ts <- (xbar_new - xbar_current) / SE  # -2.162
p_value <- pnorm(z_ts, lower.tail = TRUE)  # 0.0153

Exercise 6: Confidence Interval for Difference of Means

A logistics company compares delivery times between two shipping routes. Historical data provide \(\sigma_A = 12\) hours and \(\sigma_B = 15\) hours.

Sample results:

  • Route A: \(n_A = 64\), \(\bar{x}_A = 48\) hours

  • Route B: \(n_B = 81\), \(\bar{x}_B = 52\) hours

  1. Construct a 95% confidence interval for \(\mu_A - \mu_B\).

  2. Interpret the confidence interval.

  3. Based on the interval, would you reject \(H_0: \mu_A - \mu_B = 0\) at α = 0.05? Explain.

  4. Construct a 99% confidence interval. How does it compare to the 95% interval?

Solution

Part (a): 95% Confidence Interval

Point estimate: \(\bar{x}_A - \bar{x}_B = 48 - 52 = -4\) hours

Standard error:

\[SE = \sqrt{\frac{12^2}{64} + \frac{15^2}{81}} = \sqrt{\frac{144}{64} + \frac{225}{81}} = \sqrt{2.25 + 2.778} = \sqrt{5.028} = 2.242\]

Critical value: \(z_{0.025} = 1.96\)

Margin of error: \(ME = 1.96 \times 2.242 = 4.395\)

95% CI: \(-4 \pm 4.395 = (-8.395, 0.395)\) hours

Part (b): Interpretation

We are 95% confident that the true difference in mean delivery times (Route A minus Route B) is between -8.4 and 0.4 hours. This suggests Route A is likely faster (negative difference means A takes less time), but the interval includes positive values, so we cannot be certain.

Part (c): Hypothesis test connection

Since the 95% CI contains 0, we would fail to reject \(H_0: \mu_A - \mu_B = 0\) at α = 0.05. The interval includes values consistent with no difference, so we don’t have sufficient evidence to conclude the routes differ.

Part (d): 99% Confidence Interval

Critical value: \(z_{0.005} = 2.576\)

Margin of error: \(ME = 2.576 \times 2.242 = 5.776\)

99% CI: \(-4 \pm 5.776 = (-9.776, 1.776)\) hours

The 99% CI is wider than the 95% CI, reflecting higher confidence at the cost of precision. Both intervals contain 0.

R verification:

xbar_A <- 48; xbar_B <- 52
sigma_A <- 12; sigma_B <- 15
n_A <- 64; n_B <- 81

point_est <- xbar_A - xbar_B  # -4
SE <- sqrt(sigma_A^2/n_A + sigma_B^2/n_B)  # 2.242

# 95% CI
z_crit_95 <- qnorm(0.025, lower.tail = FALSE)  # 1.96
ME_95 <- z_crit_95 * SE
c(point_est - ME_95, point_est + ME_95)  # (-8.395, 0.395)

# 99% CI
z_crit_99 <- qnorm(0.005, lower.tail = FALSE)  # 2.576
ME_99 <- z_crit_99 * SE
c(point_est - ME_99, point_est + ME_99)  # (-9.776, 1.776)

Exercise 7: One-Sided Confidence Bound

A manufacturing engineer wants to show that Machine A produces parts with mean diameter at most 0.5 mm larger than Machine B. From quality records: \(\sigma_A = 0.3\) mm, \(\sigma_B = 0.25\) mm.

Sample results:

  • Machine A: \(n_A = 50\), \(\bar{x}_A = 25.4\) mm

  • Machine B: \(n_B = 60\), \(\bar{x}_B = 25.1\) mm

  1. State the appropriate hypotheses to test the engineer’s claim.

  2. Compute the 95% upper confidence bound for \(\mu_A - \mu_B\).

  3. Does the confidence bound support the engineer’s claim? Explain.

Solution

Part (a): Hypotheses

The engineer wants to demonstrate that \(\mu_A - \mu_B \leq 0.5\).

Approach 1 (Confidence Bound): The preferred approach for demonstrating a claim like “at most 0.5” is to construct an upper confidence bound and show it falls below 0.5. This directly addresses the question.

Approach 2 (Hypothesis Test): If we frame this as a hypothesis test, we would test whether the difference exceeds 0.5:

\[\begin{split}&H_0: \mu_A - \mu_B \geq 0.5 \quad \text{(claim is violated)}\\ &H_a: \mu_A - \mu_B < 0.5 \quad \text{(claim is satisfied)}\end{split}\]

Rejecting H₀ would provide evidence that the claim is satisfied. However, the confidence bound approach is cleaner here.

Part (b): 95% Upper Confidence Bound

Point estimate: \(\bar{x}_A - \bar{x}_B = 25.4 - 25.1 = 0.3\) mm

Standard error:

\[SE = \sqrt{\frac{0.3^2}{50} + \frac{0.25^2}{60}} = \sqrt{\frac{0.09}{50} + \frac{0.0625}{60}} = \sqrt{0.0018 + 0.00104} = \sqrt{0.00284} = 0.0533\]

Critical value for one-sided 95% bound: \(z_{0.05} = 1.645\)

Upper confidence bound:

\[UCB = (\bar{x}_A - \bar{x}_B) + z_\alpha \times SE = 0.3 + 1.645 \times 0.0533 = 0.3 + 0.0877 = 0.388\]

Part (c): Interpretation

We are 95% confident that \(\mu_A - \mu_B \leq 0.388\) mm.

Since 0.388 < 0.5, the confidence bound supports the engineer’s claim that Machine A produces parts with mean diameter at most 0.5 mm larger than Machine B.

R verification:

xbar_A <- 25.4; xbar_B <- 25.1
sigma_A <- 0.3; sigma_B <- 0.25
n_A <- 50; n_B <- 60

point_est <- xbar_A - xbar_B  # 0.3
SE <- sqrt(sigma_A^2/n_A + sigma_B^2/n_B)  # 0.0533

z_alpha <- qnorm(0.05, lower.tail = FALSE)  # 1.645
UCB <- point_est + z_alpha * SE  # 0.388

Exercise 8: Complete Analysis with Non-Zero Null Value

A consumer electronics company claims their premium headphones have at least 5 dB better noise cancellation than their standard model. An independent testing lab wants to verify this claim.

From extensive testing: \(\sigma_{premium} = 4\) dB, \(\sigma_{standard} = 5\) dB.

Test results:

  • Premium: \(n_P = 30\), \(\bar{x}_P = 38\) dB

  • Standard: \(n_S = 35\), \(\bar{x}_S = 32\) dB

Conduct a hypothesis test at α = 0.05 to evaluate whether there is sufficient evidence to support the company’s claim.

Solution

Step 1: Define the parameters

Let \(\mu_P\) = true mean noise cancellation (dB) for premium headphones. Let \(\mu_S\) = true mean noise cancellation (dB) for standard headphones.

Step 2: State the hypotheses

The company claims \(\mu_P - \mu_S \geq 5\) (at least 5 dB better). To evaluate this claim using a one-sided test with complementary hypotheses, we test whether the true difference exceeds 5:

\[\begin{split}&H_0: \mu_P - \mu_S \leq 5\\ &H_a: \mu_P - \mu_S > 5\end{split}\]

Here \(\Delta_0 = 5\) dB.

Note: Rejecting H₀ provides evidence that the difference exceeds 5 dB. Failing to reject means we lack sufficient evidence to conclude the difference is greater than 5—it does NOT mean the claim “at least 5” is supported.

Step 3: Calculate the test statistic and p-value

Point estimate: \(\bar{x}_P - \bar{x}_S = 38 - 32 = 6\) dB

Standard error:

\[SE = \sqrt{\frac{4^2}{30} + \frac{5^2}{35}} = \sqrt{\frac{16}{30} + \frac{25}{35}} = \sqrt{0.533 + 0.714} = \sqrt{1.248} = 1.117\]

Test statistic:

\[z_{TS} = \frac{(6) - 5}{1.117} = \frac{1}{1.117} = 0.895\]

P-value (upper-tailed, since Hₐ uses >):

\[p\text{-value} = P(Z > 0.895) = 1 - 0.815 = 0.185\]

Step 4: Decision and Conclusion

Since p-value = 0.185 > α = 0.05, fail to reject H₀.

The data does not give sufficient evidence (p-value = 0.185) to support the claim that the premium headphones provide more than 5 dB better noise cancellation than the standard model. Although the sample difference of 6 dB exceeds the claimed 5 dB, the observed difference is not statistically significantly greater than 5 dB at the α = 0.05 level.

Important: “Fail to reject” does not mean the claim is false—it means we lack sufficient evidence to confirm the claim is true.

R verification:

xbar_P <- 38; xbar_S <- 32
sigma_P <- 4; sigma_S <- 5
n_P <- 30; n_S <- 35
Delta_0 <- 5

point_est <- xbar_P - xbar_S  # 6
SE <- sqrt(sigma_P^2/n_P + sigma_S^2/n_S)  # 1.117
z_ts <- (point_est - Delta_0) / SE  # 0.895
p_value <- pnorm(z_ts, lower.tail = FALSE)  # 0.185

Exercise 9: Duality Between CI and Hypothesis Test

An agricultural researcher compares crop yields (kg/hectare) between two irrigation methods. From historical data: \(\sigma_A = 200\) kg/ha and \(\sigma_B = 180\) kg/ha.

Sample results:

  • Method A: \(n_A = 25\), \(\bar{x}_A = 3200\) kg/ha

  • Method B: \(n_B = 30\), \(\bar{x}_B = 2950\) kg/ha

  1. Conduct a two-sided hypothesis test at α = 0.05.

  2. Construct a 95% confidence interval.

  3. Verify that the hypothesis test and confidence interval give consistent conclusions.

  4. At what significance level would the test just barely reject H₀?

Solution

Part (a): Hypothesis Test

Step 1: Let \(\mu_A\) and \(\mu_B\) be the true mean yields for Methods A and B.

Step 2:

\[\begin{split}&H_0: \mu_A - \mu_B = 0\\ &H_a: \mu_A - \mu_B \neq 0\end{split}\]

Step 3:

SE:

\[SE = \sqrt{\frac{200^2}{25} + \frac{180^2}{30}} = \sqrt{1600 + 1080} = \sqrt{2680} = 51.77\]

Test statistic:

\[z_{TS} = \frac{3200 - 2950}{51.77} = \frac{250}{51.77} = 4.83\]

P-value:

\[p = 2 \times P(Z > 4.83) < 0.0001\]

Step 4: Since p < 0.05, reject H₀. The data does give strong support (p < 0.0001) to the claim that the mean yields differ between the two irrigation methods.

Part (b): 95% Confidence Interval

\[CI: 250 \pm 1.96 \times 51.77 = 250 \pm 101.5 = (148.5, 351.5)\]

Part (c): Consistency check

The 95% CI (148.5, 351.5) does not contain 0. This is consistent with rejecting H₀ at α = 0.05. ✓

Part (d): Smallest α for rejection

The p-value is the smallest α at which we would reject. Using R:

p_value <- 2 * pnorm(4.83, lower.tail = FALSE)  # ≈ 0.0000014

We would reject H₀ at any α ≥ 0.0000014.


11.2.8. Additional Practice Problems

True/False Questions (1 point each)

  1. The z-test for two independent samples requires both population standard deviations to be known.

    Ⓣ or Ⓕ

  2. If \(\bar{x}_A - \bar{x}_B = 5\) and \(\bar{x}_B - \bar{x}_A = -5\), they lead to the same p-value.

    Ⓣ or Ⓕ

  3. The standard error of \(\bar{X}_A - \bar{X}_B\) is at least as large as the standard error of either \(\bar{X}_A\) or \(\bar{X}_B\) alone (strictly larger when both \(\sigma_A, \sigma_B > 0\)).

    Ⓣ or Ⓕ

  4. For a 95% CI that contains zero, the corresponding two-tailed test at α = 0.05 fails to reject H₀.

    Ⓣ or Ⓕ

  5. The z-test statistic follows a standard normal distribution when H₀ is true.

    Ⓣ or Ⓕ

  6. Increasing both sample sizes will decrease the standard error.

    Ⓣ or Ⓕ

Multiple Choice Questions (2 points each)

  1. The formula for the standard error of \(\bar{X}_A - \bar{X}_B\) (known variances) is:

    \(\sqrt{\sigma^2_A + \sigma^2_B}\)

    \(\sqrt{\frac{\sigma^2_A}{n_A} - \frac{\sigma^2_B}{n_B}}\)

    \(\sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}}\)

    \(\frac{\sigma_A + \sigma_B}{\sqrt{n_A + n_B}}\)

  2. For a two-tailed z-test with z_TS = 2.5, the p-value is:

    Ⓐ P(Z > 2.5)

    Ⓑ P(Z < -2.5)

    Ⓒ 2 × P(Z > 2.5)

    Ⓓ 1 - P(Z > 2.5)

  3. Which is NOT an assumption for the two-sample z-test?

    Ⓐ Known population standard deviations

    Ⓑ Equal sample sizes

    Ⓒ Independence between samples

    Ⓓ Normal sampling distributions

  4. If a 99% CI for \(\mu_A - \mu_B\) is (2, 8), then:

    Ⓐ We reject H₀: μ_A - μ_B = 0 at α = 0.01

    Ⓑ We fail to reject H₀: μ_A - μ_B = 0 at α = 0.01

    Ⓒ We accept H₀: μ_A - μ_B = 5

    Ⓓ Cannot determine without the p-value

  5. To test if \(\mu_A > \mu_B\), the appropriate alternative hypothesis is:

    Ⓐ H_a: μ_A - μ_B = 0

    Ⓑ H_a: μ_A - μ_B < 0

    Ⓒ H_a: μ_A - μ_B > 0

    Ⓓ H_a: μ_A - μ_B ≠ 0

  6. The margin of error for a 95% CI for \(\mu_A - \mu_B\) depends on:

    Ⓐ Only the sample sizes

    Ⓑ Only the population standard deviations

    Ⓒ The sample sizes and population standard deviations

    Ⓓ The sample means and sample sizes

Answers to Practice Problems

True/False Answers:

  1. True — The z-test formula uses σ_A and σ_B directly.

  2. True — The test statistics differ only in sign, leading to the same |z| and same p-value.

  3. True — SE of difference = √(σ²_A/n_A + σ²_B/n_B) ≥ √(σ²_A/n_A) = SE of \(\bar{X}_A\) (strictly larger when both σ values are positive).

  4. True — This is the duality between CIs and two-sided hypothesis tests.

  5. True — Under H₀ and the stated assumptions, z_TS ~ N(0,1).

  6. True — SE decreases as n_A and n_B increase (they appear in denominators under the square root).

Multiple Choice Answers:

  1. — Variances add, then divide by respective sample sizes.

  2. — Two-tailed p-value = 2 × P(Z > |z_TS|).

  3. — Equal sample sizes are not required for the z-test.

  4. — The CI doesn’t contain 0, so we reject H₀: μ_A - μ_B = 0.

  5. — μ_A > μ_B means μ_A - μ_B > 0.

  6. — ME = z × √(σ²_A/n_A + σ²_B/n_B) depends on both.