11.3. Independent Two-Sample Analysis - Pooled Variance Estimator

The theoretical framework developed in Chapter 11.2 illustrates the foundational ideas of two-sample inference but relies on known population variances. This section addresses the more practical case in which these variances must be estimated from sample data. We continue, however, under the simplifying assumption that the population variances are equal.

Road Map 🧭

  • Understand the meaning of the equal variance assumption and the concept of pooled variance estimation.

  • Learn the structure of the pooled variance estimator and use it to estimate the standard error of \(\bar{X}_A - \bar{X}_B\).

  • Identify the pivotal quantity consistent with the equal variance assumption, and construct hypothesis tests and confidence regions based on their distributional properties.

11.3.1. The Equal Variance Assumption

As we relax the assumption of known population variances, we first consider the constrained cases where:

\[\sigma^2_A = \sigma^2_B = \sigma^2.\]

This assumption states that the populations have the same underlying variability. While the true variance \(\sigma^2\) still remains unknown, the assumption reduces the number of unknown elements in the framework, making the problem slightly simpler.

Meanwhile, the fundamental assumptions introduced in Chapter 11.2.1 must continue to hold throughout the subsequent discussion.

Mathematical Simplification of the Standard Error

In general, the standard error of the point estimator \(\bar{X}_A - \bar{X}_B\) is:

\[\sigma_{\bar{X}_A - \bar{X}_B} = \sqrt{\frac{\sigma^2_A}{n_A} + \frac{\sigma^2_B}{n_B}}.\]

By replacing both \(\sigma^2_A\) and \(\sigma^2_B\) with \(\sigma^2\), the standard error simplifies to:

\[\sigma_{\bar{X}_A - \bar{X}_B} = \sqrt{\frac{\sigma^2}{n_A} + \frac{\sigma^2}{n_B}} = \sigma\sqrt{\frac{1}{n_A} + \frac{1}{n_B}},\]

leaving only one unkonwn quanitity \(\sigma\) to be estimated.

11.3.2. The Pooled Variance Estimator

When both sample variances \(S^2_A\) and \(S^2_B\) estimate the same underlying parameter \(\sigma^2\), we must systematically combine the information from both samples to create a single estimator of the common variance. When samples from two or more populations are used to estimate a single variance, we call the result a pooled variance estimator.

To define a pooled estimator, we cannot simply compute an overall sample variance from the merged dataset. Recall that a sample variance is the average squared distance of data points from a common mean. When we use the combined dataset to compute an overall mean, this does not provide a good estimate of either population mean—especially when our goal is to determine whether the true means differ. Consequently, the overall sample variance also inaccurately represents the true \(\sigma^2\).

Instead, we take the weighted average of the two separate variance estimators \(S_A^2\) and \(S_B^2.\) The pooled variance estimator \(S_p^2\) is defined as:

(11.1)\[S^2_p = \frac{(n_A - 1)S^2_A + (n_B - 1)S^2_B}{n_A + n_B - 2}.\]

Understanding the Weights

The weights \((n_A - 1)\) and \((n_B - 1)\) in Equation (11.1) represent the degrees of freedom associated with the individual sample variances. These weights ensure that the contribution of each sample is proportional to the relative sample size. As \(n_A\) grows larger relative to \(n_B\), for example, \(S_p^2\) moves closer to the sample variance of Sample A.

The Pooled Estimator is Still an Average of Squared Distances

Note that each additive term in the numerator can be rewritten as the sum of squared distances between data points and their appropriate sample mean. For Sample A:

\[(n_A -1)S^2_A = (n_A -1)\frac{\sum_{i=1}^{n_A} (X_{Ai} - \bar{X}_A)^2}{n_A -1} = \sum_{i=1}^{n_A} (X_{Ai} - \bar{X}_A)^2.\]

A similar result holds for Sample B. Using this, \(S^2_p\) can be represented more explicitly as:

(11.2)\[S^2_p = \frac{\sum_{i=1}^{n_A}(X_{Ai}-\bar{X}_A)^2 + \sum_{i=1}^{n_B}(X_{Bi}-\bar{X}_B)^2}{n_A + n_B - 2}\]

Equation (11.2) makes it clear that \(S^2_p\) is still an average of squared deviations from a mean. The only change is that certain data points deviate from Mean A, while the rest deviate from Mean B.

Why Divide by \(n_A + n_B -2\)?

(a) Number of “free” data points

Degrees of freedom is a measure of how many “free” data points there are in a dataset. It usually coincides with the total number of observations subtracted by the number of other parameters estimated to construct the estimator of focus. We have \(n_A + n_B\) total observations, and two parameters—\(\mu_A\) and \(\mu_B\)—were replaced with their estimators \(\bar{X}_A\) and \(\bar{X}_B\) to construct \(S^2_p\).

(b) Correct Normalization for the Weights

For \(S^2_p\) to serve as a weighted average, the denominator must be the sum of all weights used in the numerator. \((n_A - 1) + (n_B - 1) = n_A + n_B - 2\).

(c) Makes \(S^2_p\) Unbiased

\(S^2_p\) as defined above is an unbiased estimator for the common variance \(\sigma^2\) when the equal variance assumption holds.

\[\begin{split}E[S^2_p] &= E\left[\frac{(n_A - 1)S^2_A + (n_B - 1)S^2_B}{n_A + n_B - 2}\right]\\ &= \frac{(n_A - 1)E[S^2_A] + (n_B - 1)E[S^2_B]}{n_A + n_B - 2}\end{split}\]

From single-sample theory, we know that the individual sample variances are unbiased:

\[E[S^2_A] = \sigma^2_A \quad \text{ and } \quad E[S^2_B] = \sigma^2_B.\]

In addition, both equal \(\sigma^2\) under the equal variance assumption. This implies:

\[\begin{split}E[S^2_p] &= \frac{(n_A - 1)\sigma^2 + (n_B - 1)\sigma^2}{n_A + n_B - 2}\\ &=\frac{\sigma^2[(n_A - 1) + (n_B - 1)]}{n_A + n_B - 2}\\ &= \frac{\sigma^2(n_A + n_B - 2)}{n_A + n_B - 2} = \sigma^2\end{split}\]

Therefore, \(S^2_p\) is unbiased.

11.3.3. Estimated Standard Error with Pooled Variance

Recall the true standard error for the point estimator \(\bar{X}_A - \bar{X}_B\) was:

\[\sigma_{\bar{X}_A - \bar{X}_B} = \sigma\sqrt{\frac{1}{n_A} + \frac{1}{n_B}}.\]

The estmated standard error \(\widehat{SE}_p\) is obtained by replacing the unkonwn \(\sigma\) with the pooled standard deviation estimator \(S_p = \sqrt{S^2_p}\):

\[\widehat{SE}_p = S_p\sqrt{\frac{1}{n_A} + \frac{1}{n_B}}.\]

Example 💡: Is It Better to Balance the Sample Size?

Two researchers are studying a pair of Populations A and B, which are assumed to have the same true variance. Researcher 1 designs the experiment with \(n_A = n_B = 10\), while Researcher 2 uses \(n_A = 15, n_B = 5\).

Q1. How do the degrees of freedom compare between the two designs?

In both cases, \(df=n_A + n_B - 2 = 18\). The dgrees of freedom are equal.

Q2. Which design will have a smaller true standard error? Explain why.

The true standard error for Researcher 1 is:

\[\sigma \sqrt{\frac{1}{10} + \frac{1}{10}} \approx 0.4472 \sigma .\]

For Researcher 2,

\[\sigma \sqrt{\frac{1}{15} + \frac{1}{5}} \approx 0.5164 \sigma .\]

The standard error for Researcher 2 is approximately 15% larger than the standard error for Researcher 1. In general, when the total sample size is fixed and the population variances are equal, the smallest possible standard error is achieved with balanced sample sizes.

Q3. In what scenarios might one design be favored over the other?

  • The design by Researcher 2 would be chosen if it is much more costly to sample from one population than the other, or if the samples are already imbalanced and there are no resources to collect additional data.

  • If there are no serious difference in sampling costs between the two populations, we would always prefer to have balanced samples, since this leads to a more precise inference without changing the overall cost significantly.

11.3.4. Hypothesis Testing for Independent Two Samples with Equal Variance Assumption

In the four-step framework of hypothesis testing, Steps 1, 2, and 4 are identical to the case with known population variances (see Chapter 11.2 to review the details). We focus our discussion on Step 3, where the key change occurs.

The t-Test Statistic

Recall the common structure of all prevoiusly learned test statistics:

\[\text{Test Statistic} = \frac{\text{estimator}-\text{null value}}{\text{std. error}}.\]

Our new test statistic will follow the same format. The estimator is \(\bar{X}_A - \bar{X}_B\), the null value \(\Delta_0\), and the standard error \(\sigma\sqrt{\frac{1}{n_A} + \frac{1}{n_B}}\). Since we do not know the value of \(\sigma\), however, we must replace it with its estimator, \(S_p\). By putting the components together, we get:

\[T_{TS} = \frac{(\bar{X}_A - \bar{X}_B) - \Delta_0}{S_p\sqrt{\frac{1}{n_A} + \frac{1}{n_B}}}.\]

When

  • all assumptions introduced in Chapter 11.2.1 hold,

  • the true variances are indeed equal, and

  • the null hypothesis is true,

the test statistic follows a \(t\)-distribution with \(df = n_A + n_B - 2\). The additional uncertainty from estimating \(\sigma^2\) with \(S^2_p\) manifests as the heavier tails characteristic of \(t\)-distributions.

The \(p\)-Values

The \(p\)-value computation reflects the new distribution of the \(t\)-test statistic. Denoting the observed \(t\)-test statistic as \(t_{TS}\), the \(p\)-values for the three hypothesis types are summarized in the table below.

\(p\)-values for independent two-sample tests (unknown pooled variance)

Upper-tailed

\[P(T_{n_A + n_B -2} > t_{TS})\]
df <- nA + nB - 2
pt(t_ts, df=df, lower.tail=FALSE)

Lower-tailed

\[P(T_{n_A + n_B -2} < t_{TS})\]
df <- nA + nB - 2
pt(t_ts, df=df)

Two-tailed

\[2P(T_{n_A + n_B -2} < -|t_{TS}|) \quad \text{ or } \quad 2P(T_{n_A + n_B -2} > |t_{TS}|)\]
df <- nA + nB - 2
#Two options:
2 * pt(-abs(t_ts), df=df)
2 * pt(abs(t_ts),df=df, lower.tail=FALSE)

Example 💡: Dexterity Skill Assessment

A group of 15 new employees participated in a manual dexterity test alongside 20 experienced industrial workers at a high-precision manufacturing company. Skill levels were assessed using test scores from both groups, as shown in the table below:

Group

\(n\)

Sample mean

Sample sd

New

15

35.12

4.31

Senior

20

37.32

3.83

Perform a hypothesis test to determine if the skill levels for the new employees are lower on average than the senior workers. Use the significance level of \(\alpha=0.05\).

Step 1: Define the parmeters

We let \(\mu_{new}\) be the true mean test score of all new employees at this company. Let \(\mu_{exp}\) be the true mean score for the population of experienced employees.

Step 2: Write the Hypotheses

\[\begin{split}&H_0: \mu_{exp} - \mu_{new} \leq 0 \\ &H_a: \mu_{exp} - \mu_{new} > 0\end{split}\]

The test can also be defined in terms of \(\mu_{new}-\mu_{exp}\), in which case a lower-tailed test is appropriate. We continue with \(\mu_{exp}-\mu_{new}\) and an upper-tailed test.

The test statistic, degrees of freedom, and p-value

  • The point estimate of the difference is:

    \[\bar{x}_{exp} - \bar{x}_{new} = 37.32 - 35.12 = 2.2\]
  • The pooled variance estimate is:

    \[\begin{split}s^2_p &= \frac{(n_{exp} - 1)s^2_{exp} + (n_{new} - 1)s^2_{new}}{n_{exp} + n_{new} - 2}\\ &= \frac{(20 - 1)(3.83)^2 + (15 - 1)(4.31)^2}{20 + 15 - 2} = 16.3265\end{split}\]
  • The estimated standard error is:

    \[SE_p = s_p \sqrt{\frac{1}{n_{exp}}+\frac{1}{n_{new}}} = \sqrt{16.3265}\sqrt{\frac{1}{20} + \frac{1}{15}} \approx 1.3801\]
  • Putting together, the observed test statistic is:

    \[t_{TS} = \frac{(\bar{x}_{exp} - \bar{x}_{new}) - \Delta_0}{SE_p} = \frac{2.2-0}{1.3801} = 1.5941\]
  • Under the null hypothesis, the random variable \(T_{TS}\) follows a \(t\)-distribution with \(df = 20+15-2=33\). Therfore, the upper-tailed p-value is

    \[p = P(T_{33} > 1.5941) = 0.0602\]

Decision and Conclusion

Since \(p = 0.0602 > 0.05\), we fail to reject the null hypothesis. The data does not provide sufficient evidence to support the claim that the mean dexterity test score is higher for experienced workers than for new workers.

11.3.5. Confidence Regions

For confidence interval construction, we use the studentization of \(\bar{X}_A - \bar{X}_B\) as the pivotal quantity:

\[T = \frac{(\bar{X}_A - \bar{X}_B) - (\mu_A - \mu_B)}{S_p\sqrt{\frac{1}{n_A} + \frac{1}{n_B}}}.\]

When all the assumptions hold, \(T\) follows a \(t\)-distribution with \(df = n_A + n_B - 2\). The resulting \(100(1-\alpha)\%\) confidence interval for \(\mu_A - \mu_B\) is:

\[(\bar{x}_A - \bar{x}_B) \pm t_{\alpha/2, n_A + n_B - 2} \cdot S_p\sqrt{\frac{1}{n_A} + \frac{1}{n_B}}\]

Again, the confidence interval follows a familiar format:

  • It is centered at the point estimate \(\bar{x}_A - \bar{x}_B\)

  • The margin of error (ME) is a product of a critical value and the estimated standard error.

The table below summarizes all three types of confidence regions for the difference in two means, when the pooled estimator is used for the unkonwn variances:

\(100 \cdot C \%\) Confidence regions for difference in means (unknown pooled variance)

Confidence interval

\[(\bar{x}_A - \bar{x}_B) \pm t_{\alpha/2, n_A + n_B - 2} \cdot S_p\sqrt{\frac{1}{n_A} + \frac{1}{n_B}}\]

Lower confidence bound

\[(\bar{x}_A - \bar{x}_B) - t_{\alpha, n_A + n_B - 2} \cdot S_p\sqrt{\frac{1}{n_A} + \frac{1}{n_B}}\]

Upper confidence bound

\[(\bar{x}_A - \bar{x}_B) + t_{\alpha, n_A + n_B - 2} \cdot S_p\sqrt{\frac{1}{n_A} + \frac{1}{n_B}}\]

The critical values can be computed using the R code:

upper_tail_prob <- alpha #or alpha/2 depending on the type
df <- nA + nB - 2
qt(upper_tail_prob, df=df, lower.tail=FALSE)

Example 💡: Dexterity Skill Assessment, Continued

For the dexterity skill comparison problem, compute the \(95\%\) confidence region that gives an equivalent result to the previously performed hypothesis test. Interpret the result and explain how it is consistent with the test result.

Group

\(n\)

Sample mean

Sample sd

New

15

35.12

4.31

Senior

20

37.32

3.83

Which confidence region?

Let us continue to use the difference \(\mu_{exp} - \mu_{new}\). The duality of hypothesis tests and confidence regions discussed in Chapter 10.3.1 still hold. When all other experimental settings match, an upper-tailed hypothesis gives a consistent result with a lower confidence bound.

Compute the confidence region

From the previous example, we know:

  • \(\bar{x}_{exp} - \bar{x}_{new} = 2.2\)

  • \(\widehat{SE}_p = s_p\sqrt{\frac{1}{n_A} + \frac{1}{n_B}} = \approx 1.3801\)

  • \(df = 33\)

The critical value \(t_{0.05, 33}\) is:

qt(0.05, df=33, lower.tail=FALSE)
# returns 1.69236

Then finally, the lower confidence bound is:

\[2.2 - 1.69236 \cdot 1.3801 = -0.153626.\]

Interpretation of the Confidence Bound

With \(95\%\) confidence, the true difference in mean test scores (\(\mu_{exp}-\mu_{new}\)) between the two employee groups lies above the lower bound of \(-0.153626\).

Connection to the Hypothesis Test

We were not able to reject the null hypothesis that the true difference was less than or equal to the null value \(\Delta_0 = 0\) at \(\alpha=0.5\). This is consistent with the fact that the null value is inside the plausible region indicated by the \(95\%\) lower confidence bound.

11.3.6. Bringing It All Together

Key Takeaways 📝

  1. Pooled variance procedures assume equal population variances \((\sigma^2_A = \sigma^2_B)\).

  2. The pooled variance estimator \(S^2_p = \frac{(n_A - 1)S^2_A + (n_B - 1)S^2_B}{n_A + n_B - 2}\) combines information from both samples using weights proportional to degrees of freedom.

  3. When the equal variance assumption holds, the random variable \(\frac{(\bar{X}_A - \bar{X}_B)-(\mu_A - \mu_B)}{\widehat{SE}_p}\) follows a \(t\)-distribution with \(df = n_A + n_B - 2\). Both hypothesis testing and confidence regions are constructed using the same core ideas as previously used, but using this new \(t\)-distribution.

Exercises

  1. Pooled Variance Calculation: Two independent samples yield the following results:

    • Sample A: \(n_A = 12\), \(s^2_A = 25.6\)

    • Sample B: \(n_B = 15\), \(s^2_B = 31.2\)

    1. Calculate the pooled variance estimator \(s^2_p\).

    2. Find the pooled standard deviation \(s_p\).

    3. Determine the degrees of freedom for the pooled procedure.

    4. Calculate the estimated standard error \(\widehat{SE}_p\).

  2. Assumption Analysis: For each scenario, discuss whether the equal variance assumption seems reasonable and justify your answer:

    1. Comparing customer satisfaction scores (1-10 scale) between two similar retail stores

    2. Comparing income levels between urban and rural populations

    3. Comparing reaction times before and after consuming caffeine (using different subjects)

    4. Comparing crop yields between two fertilizer treatments on similar plots