Slides 📊
11.4. Independent Two-Sample Analysis - No Equal Variance Assumption
When population standard deviations are unknown and we cannot reasonably assume that the variances are equal across populations, each variance must be estimated separately. This section develops the unpooled approach for indepndent two-sample comparisons and emphasizes the importance of choosing the appropriate method—pooled or unpooled—based on evidence provided by the data.
Road Map 🧭
Estimate the standard error for independent two-sample comparison without the equal variance assumption.
Recognize that the new pivotal method approximately follows a \(t\)-distribution and that its degrees of freedom must also be approximated.
Use Welch-Satterthwaite approximation for the degrees of freedom.
Learn the consequences of incorrectly using the pooled versus unpooled approaches and why it is safer to use the more general unpooled approach as the default.
Know that the unpooled procedure is robust to moderate departures from normality. List the characteristics of the data for which we expect the method to work best.
11.4.1. Mathematical Framework
The Assumptions
This lesson still develops upon the core assumptions introduced in Chapter 11.2.1.
Recall the true standard error for the difference of means \(\bar{X}_A - \bar{X}_B\):
Since the two population variance \(\sigma^2_A\) and \(\sigma^2_B\) are not assumed equal anymore, we do not take any further simplification steps and directly replace the unknown values with their respective estimators, \(S^2_A\) and \(S^2_B\).
The estimated standard error is then:
It follows that the studentization of \(\bar{X}_A - \bar{X}_B\) is:
The pivotal quantity is denoted \(T'\) with a prime because even when all assumptions hold, it only approximately follows a \(t\)-distribution. Not only that, the true degrees of freedom for the best approximating \(t\)-distribution depends on the unknown variances and must also be approximated.
We use the Welch-Satterthwaite Approximation for the unknown degrees of freedom:
Once \(\nu\) is computed, we treat \(T'\) as having a \(t\)-dstribution with the degrees of freedom \(\nu\) for further construction of inference methods. This approximation generally shows good performance in practice—it tends to produce a slightly conservative inference result (wider confidence regions, less likely to reject \(H_0\)) when sample sizes are small.
The Approximated \(\nu\) May Not Be an Integer
\(\nu\) is typically not integer-valued, which is okay. R accepts
non-integer values for the df argument of \(t\)-related functions. Use the approximated \(\nu\)
without making any adjustments.
11.4.2. Hypothesis Tests and Confidence Regions
Hypothesis Testing
Within the four-step framework of hypothesis testing, Steps 1, 2, and 4 of the unpooled approach is identical to the other types of independent two-sample comparisons. Revisit Chapter 11.2 for the related details. We will focus on Step 3, where we compute the test statistic, degrees of freedom, and p-value.
The observed test statistic takes the familiar form: it is the difference between the observed point estimate and the null value, standardized by the estimated standard error.
Compute the degrees of freedom \(\nu\) using the Welch-Satterthwaite Approximation formula. Then the \(p\)-values are:
Two-sided: \(2P(T_\nu < -|t'_{TS}|)\)
Upper-tailed: \(P(T_\nu > |t'_{TS}|)\)
Left-tailed: \(P(T_\nu < -|t'_{TS}|)\)
Example 💡: Teaching Methods Comparison
To investigate whether directed reading activities improve students’ reading ability, students at an elementary school were randomly assigned to either directed reading activities or standard instruction.
Their performance was measured by Degree of Reading Power (DRP) scores. Higher DRP scores indicate better reading ability. The observed statistics are:
Method |
New |
Standard |
|---|---|---|
Sample size |
\(n_{new} = 21\) |
\(n_{std} = 23\) |
Sample mean |
\(\bar{x}_{new} = 51.48\) |
\(\bar{x}_{std} = 41.52\) |
Sample standard deviation |
\(s_{new} = 11.01\) |
\(s_{std} = 17.15\) |
Step 1: Define Parameters
Use \(\mu_{new}\) to denote the true mean DRP score for students receiving directed reading instruction and \(\mu_{std}\) for the true mean DRP score for students receiving traditional instruction.
Both population standard deviations are unknown and will be estimated separately.
Step 2: Formulate Hypothesis
Step 3-1: Explore Data and Choose Analysis Method
Fig. 11.3 Boxplots and historgrams by groups
Both distributions approximately normal with mild skewness in the traditional group. Sample sizes are adequate for mild departures from normality.
No serious outliers identified in modified box plots.
Some evidence that variances may differ between groups.
We use the unpooled indpendent two-sample approach.
Step 3-2: Computate Test Statistic, DF, and p-Value
The observed test statistic is:
For the Welch-Satterthwaite approximate degrees of freedom, we encourage breaking down its computation into small components.
\(\frac{s^2_{new}}{n_{new}} = \frac{11.01^2}{21}=5.77\)
\(\frac{s^2_{std}}{n_{std}} = \frac{17.15^2}{23}=12.78\)
Then finally,
The \(p\)-value is \(P(T_{37.8} > 2.31)\). Using R,
pt(2.31, df = 37.8, lower.tail = FALSE)
\(p\)-value \(=0.013\).
Step 4: Write the Decision and Conclusion
Since p-value \(= 0.013 < \alpha = 0.05\), we reject the null hypothesis. The data give some support (p-value = 0.013) to the claim that directed reading activities improve elementary school students’ reading ability as measured by DRP scores.
11.4.3. Confidence Regions
The confidence regions can be derived using the pivotal method and \(T'\) in Eq. (11.3).
Confidence regions for independent two-sample tests (unknown unpooled variances) |
|
|---|---|
Confidence interval |
\[(\bar{x}_A - \bar{x}_B) \pm t_{\alpha/2,\nu} \sqrt{\frac{s^2_A}{n_A} + \frac{s^2_B}{n_B}}\]
|
Upper confidence bound |
\[(\bar{x}_A - \bar{x}_B) + t_{\alpha,\nu} \sqrt{\frac{s^2_A}{n_A} + \frac{s^2_B}{n_B}}\]
|
Lower confidence bound |
\[(\bar{x}_A - \bar{x}_B) - t_{\alpha,\nu} \sqrt{\frac{s^2_A}{n_A} + \frac{s^2_B}{n_B}}\]
|
The critical value \(t_{\alpha/2,\nu}\) (or \(t_{\alpha,\nu}\)) is computed with \(\nu\) approximated using the Welch-Satterthwaite formula.
Example 💡: Teaching Methods Comparison, Continued
For the teaching methods comparison problem, compute the confidence region consistent with the previously performed hypothesis test. The summary statistics are:
Method |
New |
Standard |
|---|---|---|
Sample size |
\(n_{new} = 21\) |
\(n_{std} = 23\) |
Sample mean |
\(\bar{x}_{new} = 51.48\) |
\(\bar{x}_{std} = 41.52\) |
Sample standard deviation |
\(s_{new} = 11.01\) |
\(s_{std} = 17.15\) |
95% Lower Confidence Bound
An upper-tailed hypothesis test is consistent with a lower confidence bound as long as the significance level and the confidence coefficient add to one. Therefore, we need to compute a 95% lower confidence bound for the difference \(\mu_{new} - \mu_{std}.\) From the previous example, we already have
Observed sample difference: \(\bar{x}_{new} - \bar{x}_{std} = 9.96\)
Estimated standard error: \(\widehat{SE} = 4.31\)
The Welch-Satterthwaite approximate degrees of freedom: \(\nu = 37.8\)
The critical value for a one-sided confidence region is
qt(0.05, df=37.8, lower.tail=FALSE)
#returns 1.686
Finally, the lower bound is:
We are 95% confident that the new teaching method improves DRP scores by more than 2.69 points on average.
Using t.test() Function fo Inferences
When we have the complete raw data, we can use t.test() to perform a hypothesis test and
compute a confidence region simultaneously.
First, organize the data into two vectors.
The vector
quantitiativeVariableshould list all observations from both groups.The vector
categoricalVariableshould list the group labels of the observations listed in thequantitativeVariablevector.
Then run the following code after replacing each argument with the appropriate values:
t.test(quantitativeVariable ~ categoricalVariable,
mu = Delta0,
conf.level = C,
paired = FALSE,
alternative = "alternative_hypothesis",
var.equal = FALSE)
For the teaching methods comparison problem, we would set
mu=0conf.level = 0.95paired=FALSE(Paired two-sample analysis ifTRUE)alternative="greater"(other options aretwo.sidedandless)var.equal=FALSE(Pooled ifTRUE; Unpooled is the default choice for this course)
11.4.4. Pooled vs Unpooled Approaches
When choosing between the pooled and unpooled approaches, it is preferable to use the method that best reflects the truth. If the population variances are indeed equal, the pooled method is more appropriate; otherwise, the unpooled approach should be used. However, since the true variances are typically unknown, this simple rule is often impractical to apply.
Let us examine the consequences of incorrectly using pooled versus unpooled methods and explain why we adopt the unpooled approach as the default choice in this course.
Incorrectly Using the Unpooled Approach
By using the unpooled approach for two populations whose variances are in fact equal, we lose efficiency in two aspects:
The analysis method becomes unnecessarily complicated. Instead of the exact \(t\)-distribution and the simple, integer-valued degrees of freedom \(df=n_A + n_B -2,\) we must use the approximation method.
The approach uses the data points less efficiently, leading to decreased power.
Note that incorrectly applying a more general (unpooled) method to a special case (equal true variances) leads to some loss in efficiency but rarely leads to serious errors.
Incorrectly using the Pooled Approach
The consequences are typically more serious when a special-case method is applied incorrectly to a general case. Using a pooled approach on samples drawn from populations with unequal variances risks not only a loss of efficiency but also a loss of theoretical validity.
The problem is especially pronounced when the two sample sizes differ substantially. Consider a scenario where \(n_A = 1500\), \(n_B = 200\). The pooled variance estimator becomes:
Since \(n_A\) is much larger than \(n_B\), the pooled estimator will be heavily weighted toward \(S^2_A\).
When \(\sigma_A > \sigma_B\)
If \(\sigma_A > \sigma_B\), then the overall variability will be overestimated, leading to:
Overly wide confidence intervals, and
Reduced power—true differences become harder to detect.
When \(\sigma_A < \sigma_B\)
On the other hand, if \(\sigma_A < \sigma_B\), then the overall variability will be underestimated. As a result,
Confidence interals become too narrow and fail to contain the true difference with the nominal \(100C \%\) coverage probability; the true coverage probability will be smaller.
The hypothesis tests will have type I error rate greater than \(\alpha\); that is, the test will make the mistake more often than \(100\alpha \%\) of the time.
The consequences are especially severe when the true variability is underestimated, as the resulting inferences no longer satisfy the theoretical guarantees they are intended to uphold.
The Unpooled Method is the Default in this Course‼️
Unless directed otherwise, students are expected to solve indpendent two-sample inference problems with unkown variances using the unpooled approach.
11.4.5. Robustness and Assumption Checking
The unpooled \(t\)-procedure is robust to moderate departures from normality, with robustness increasing with sample size. Use the following guidelines on the sample sizes and types of departure from normality.
A. Sample Size Guidelines
Total Sample Size |
Normality Requirements |
|---|---|
\(n_A + n_B < 15\) |
Data must be very close to normal. Requires careful graphical assessment. |
\(15 \leq n_A + n_B < 40\) |
Can tolerate mild skewness. Strong skewness still problematic. |
\(n_A + n_B \geq 40\) |
Usually acceptable even with moderate skewness. |
When \(n_A \approx n_B\), t-procedures are more robust to moderate normality violations.
B. Guidelines on Different Types of Departure from Normality
In addition to the total sample size, the robustness of \(t\)-procedures also depends on the specific nature of departure from normality. Visualize the data using histograms, QQ plots, and side-by-side boxplots.
Outliers can invalidate procedures regardless of sample size
Extreme skewness or heavy tails may require much larger samples than guidelines suggest
Multiple modes may indicate population heterogeneity
11.4.6. Bringing It All Together
Key Takeaways 📝
Unpooled procedures avoid the restrictive equal variance assumption by estimating the standard error with \(\widehat{SE} = \sqrt{\frac{S^2_A}{n_A} + \frac{S^2_B}{n_B}}\).
The distribution of the pivotal quantity \(T'\) must be approximated with a \(t\)-distribution. We use the Welch-Satterthwaite approximation for its approximate degrees of freedom.
Pooled procedures can result in serious failures when equal variance assumptions are violated.
Unpooled procedures provide robust inference that works whether variances are equal or unequal, with only minor efficiency loss when variances are actually equal. This course uses the unpooled procedure as the default independent two-sample method.
Exercises
Degrees of Freedom Calculation: Two independent samples have the following characteristics:
Sample A: \(n_A = 15\), \(s^2_A = 28.4\)
Sample B: \(n_B = 20\), \(s^2_B = 45.7\)
Calculate the Welch-Satterthwaite degrees of freedom.
Compare this to the pooled degrees of freedom \(n_A + n_B - 2\).
Explain why the Welch degrees of freedom is typically smaller.
Method Comparison: A researcher has samples with \(n_A = n_B = 25\) and approximately equal sample variances.
What are the advantages of using pooled procedures in this scenario?
What are the advantages of using unpooled procedures?
Which approach would you recommend and why?
Robustness Assessment: You have two samples with total size \(n_A + n_B = 30\). One sample shows moderate right skewness and the other has one potential outlier.
What additional information would you need to assess the appropriateness of t-procedures?
What graphical tools would you use to evaluate the assumptions?
Under what conditions might you proceed with t-procedures despite these concerns?