Slides 📊
12.3. ANOVA F-Test and Its Relationship to Two-Sample t-Tests
We have developed the theoretical foundation for ANOVA by decomposing total variability into between-group and within-group components. Now we are ready to construct the hypothesis test that will tell us whether observed differences in sample means are statistically significant.
Road Map 🧭
Understand why the \(F\)-test statistic serves as an indicator of differences among population means.
Describe the properties of the \(F\) distributions.
Construct a complete ANOVA table and perform an ANOVA \(F\)-test using the four-step framework.
Recognize the connections between ANOVA and independent two-sample inference.
12.3.1. Building the Test Statistic
Recall the goal of the ANOVA hypothesis test: we would like to compare the variabilities within and between groups, and reject the null hypothesis that all means are equal if the between-group variability is significantly larger.
Fig. 12.7 Within-group vs between-group variability
To formalize this comparison, we use the ratio between MSA and MSE.
When \(H_0\) is true, both MSA and MSE estimate \(\sigma^2\), so their observed ratio tends to be close to 1. When \(H_0\) is false, however, MSA estimates something larger, making it more likely for the ratio to take a value significantly larger than 1.
Under the null hypothesis, the distribution of the ratio belongs to the family of \(F\)-distributions. For this reason, the ratio is called the \(F\)-test statistic, or \(F_{TS}\). To complete the hypothesis testing construction, we next review the main properties of \(F\)-distributed random variables.
12.3.2. The \(F\)-Distribution
\(F\)-distributions are parameterized by two degrees of freedom: \(df_1\) and \(df_2\). When a ramdom variable \(X\) follows an \(F\)-distribution, we write:
\(F\)-distributions are always supported on \([0, \infty)\) and are right-skewed regardless of the parameter values. As the two degrees of freedom grow,
the skewness weakens (see the yellow curve in Fig. 12.8),
the expected value quickly approaches 1, and
the variance decreases.
Fig. 12.8 \(F\)-distribution with different sets of parameter values
Let us now discuss the specific \(F\) distribution of the ANOVA test statistic. Under the null hypothesis,
where \(k\) represents the number of groups and \(n\) the total sample size.
Drawing connections with the general properties of \(F\)-distributions,
\(F_{TS}\) will always yield a non-negative outcome since it is a ratio of two non-negative random variables. This agrees with the support of its null distribution.
As the total sample size \(n\) grows, the expected value of \(F_{TS}\) grows closer to 1 and its spread becomes narrower arround the mean.
The \(p\)-Value for ANOVA
Regardless of the analysis method, a \(p\)-value always represents the probability of obtaining a result more inconsistent with the null hypothesis than the one observed. In ANOVA, such inconsistency corresponds to a greater observed \(F\)-test statistic. Therefore,
where \(F_{k-1,n-k}\) is a random variable following an \(F\) distribution with \((df_1,df_2)=(k-1, n-k)\),
and \(f_{TS}\) is the observed \(F\)-test statistic. On R, the \(p\)-value can be obtained using
the pf function:
pvalue <- pf(f_ts, df1=k-1, df2=n-k, lower.tail=FALSE)
The Complete ANOVA Table
We are now fully equipped to construct the complete ANOVA table that we left partially filled in Chapter 12.2. The entries marked with ⅹ are typically left blank.
Source |
df |
SS |
MS |
\((f_{TS})\) |
\(p\)-value |
|---|---|---|---|---|---|
Factor A |
\(k-1\) |
\(\sum_{i=1}^k n_i(\bar{x}_{i \cdot} - \bar{x}_{\cdot \cdot})^2\) |
\(\frac{\text{SSA}}{k-1}\) |
\(\frac{MSA}{MSE}\) |
\(P(F_{k-1,n-k} \geq f_{ts})\) |
Error |
\(n-k\) |
\(\sum_{i=1}^k \sum_{j=1}^{n_i}(x_{ij} - \bar{x}_{i \cdot})^2\) |
\(\frac{\text{SSE}}{n-k}\) |
ⅹ |
ⅹ |
Total |
\(n-1\) |
\(\sum_{i=1}^k \sum_{j=1}^{n_i}(x_{ij} - \bar{x}_{\cdot \cdot})^2\) |
ⅹ |
ⅹ |
ⅹ |
Example 💡: The Complete ANOVA Table for the Coffeehouse Study
For the cofeehouse study, complete the remaining entries of the ANOVA table.
Fig. 12.9 Partial ANOVA table for the coffeehouse example
We only need to fill the two entries corresponding to the observed \(f_{TS}\) and the \(p\)-value.
First,
Recall that under the null hypothesis and a total sample size as large as \(n=200\), the \(F\)-test statistic is distributed sharply around 1. The observed value of 22.14 already gives a strong sign of inconsistency with the null hypothesis.
Let us continue to computing the \(p\)-value and check if our prediction is confirmed:
As expected, the \(p\)-value is very small.
We are now ready to organize the ANOVA hypothesis test into a full four-step framework.
12.3.3. The Four Steps of ANOVA Hypothesis Testing
Step 1: Define Parameters
Define the population mean \(\mu_i\), for each \(i \in \{1, \cdots, k\}\). The definition should clearly describe the populations of interest and connect each \(\mu_i\) to a specific population.
Step 2: State the Hypotheses
The alternative hypothesis can be written in several equivalent ways:
\(H_a:\) At least one \(\mu_i\) is different from the rest
\(H_a:\) Not all population means are equal
\(H_a:\) At least one \(\mu_i\) differs from the others
Step 3-1: Check Assumptions
Before proceeding, we must verify that the ANOVA assumptions are reasonable. See Section 12.1.3 for a complete walkthrough of graphical verification. In addition, we must confirm numerically that the equal variance assumption is reasonable by showing:
Step 3-2: Calculate the Test Statistic, Degrees of Freedom, and \(p\)-Value
In this step, it is often helpful to first construct the full ANOVA table.
Use the computed MSA and MSE for the observed test statistic, \(f_{TS}\):
\[f_{TS} = \frac{\text{MSA}}{\text{MSE}}\]State the degrees of freedom:
\(df_A = k - 1\)
\(df_E = n - k\)
Compute the \(p\)-value, \(P(F_{df_A, df_E} \geq f_{TS})\):
pf(f_ts, df1 = k-1, df2 = n-k, lower.tail = FALSE)
Step 4: Make Decision and State Conclusion
Decision rule stays unchanged:
If \(p\)-value \(\leq \alpha\), reject \(H_0\).
If \(p\)-value \(> \alpha\), fail to reject \(H_0\).
Conclusion template:
“The data [does/does not] give [weak/moderate/strong] support (p-value = [value]) to the claim that [statement of \(H_a\) in context].”
What Happens If \(H_0\) Is Rejected?
When ANOVA indicates that “at least one mean differs from the others,” it naturally raises the next question: “Which specific groups are different?” This brings us to multiple comparison procedures, explored in the next section. These methods allow specific pairwise comparisons while controlling the overall error rate.
For now, it’s important to understand that ANOVA serves as a gatekeeper test, screening data sets that require pairwise comparisons from those that do not.
Example💡: Complete ANOVA Testing for the Cofeehouse Data ☕️
Perform a hypothesis test at \(\alpha = 0.01\) to determine if the five coffeehouses around campus attract customers of different average ages.
Fig. 12.10 Coffeehouse example summary
Step 1: Define the Parameters
Let \(\mu_{1}, \mu_{2}, \mu_{3}, \mu_{4}, \mu_{5}\) represent the true mean customer age at coffeehouses 1, 2, 3, 4, and 5, respectively.
Step 2: State the Hypotheses:
Step 3-1: Check Assumptions
This step should include all the following elements:
Graphical check for any serious deviations from normality in individual samples
Graphical check for any signs of violation of the equal variance assumption
Using the numerical method to confirm that the sample variances (standard deviations) are within similar ranges.
Refer to the last example of Chapter 12.1.
Step 3-2: Calculate the Test Statistic, Degrees of Freedom, and p-Value
Fig. 12.11 Complete ANOVA table for the coffeehouse example
Test statistic: \(f_{TS} = 22.14\)
Degrees of freedom for the null distribution: \(df_A = 4\), \(df_E=195\)
\(p\)-value \(=4.4 \times 10^{-15}\)
Step 4: Decision and Conclusion
Since p-value = \(4.4 \times 10^{-15} < 0.01 = \alpha\), we reject \(H_0\). The data gives strong support (p-value = \(4.4 \times 10^{-15}\)) to the claim that at least one of the coffee shops around campus differs in the mean age of customers from the rest.
12.3.4. The Connection Between F-Tests and t-Tests
It is possible to view one-way ANOVA as a generalization of independent two-sample analysis under certain conditions. Specifically, one-way ANOVA with \(k=2\) is equivalent to a two-tailed independent two-sample hypothesis test with \(\Delta_0=0\) and the equal variance assumption.
We show this special relationship by demonstrating that the \(F\)-test statistic for ANOVA is equal to the square of the \(t\)-test statistic for the two-sammple comparison. In turn, we also show that the \(p\)-values computed from these two statistics are identical.
Connection Between the Test Statistics
When \(k=2\), the ANOVA \(F\)-test statistic is:
Through algebraic manipulation (which involves expressing the overall mean \(\bar{X}_{\cdot \cdot}\) as a weighted average of the group means), this simplifies to:
Recall the \(t\)-test statistic for independent two-sample comparison with the pooled variance estimator:
By setting \(\Delta_0 = 0\) and squaring \(T_{TS}\), we recover \(F_{TS}\).
Equivalence of the \(p\)-Values
Using the connection between the two test statistics, the ANOVA \(p\)-value satisfies:
The final probability statement is, in fact, the \(p\)-value for the two-sided \(t\)-test. That is, the \(p\)-value computed through one-way ANOVA is identical to the \(p\)-value computed from a two-sided test for difference between two means.
It follows that the decision to reject or fail to reject the null hypothesis is also identical—essentially, the two tests are the same procedure in difference forms.
Summary
In summary, ANOVA with \(k=2\) is equivalent to an independent two-sample analysis with the equal variance assumption and a null value of zero. Between the two options, then, what should we choose? The decision depends on whether you value the flexibility of the two-sample analysis or the generalizability of ANOVA.
Comparison of Independent Two-Sample \(t\)-Test and ANOVA |
||
|---|---|---|
Feature |
Indepdent Two-Sample \(t\)-Test |
One-Way ANOVA |
Variance Assumption |
Can assume equal or unequal variances among groups |
Assumes equal variances |
Hypothesis Type |
Any direction can be chosen |
Two-sided only |
Null Value \(\Delta_0\) |
Can by any value |
Limited to \(\Delta_0 = 0\) |
Number of Groups |
Exactly 2 groups |
2 or more groups |
12.3.5. Bringing It All Together
Key Takeaways 📝
The F-test statistic \(\frac{\text{MSA}}{\text{MSE}}\) compares between-group to within-group variability, with large values providing evidence against \(H_0\).
The F-distribution is right-skewed, non-negative, and with mean approximately 1.s Its shape is controlled by two degrees of freedom. Under the null hypothesis, the \(F\)-test statistic follows an \(F\) distribution with \(df_1=k-1\) and \(df_2=n-k\).
The ANOVA table organizes all components of ANOVA, including the \(F\)-test statistic and the \(p\)-value.
The complete ANOVA hypothesis testing follows the standard four-step framework.
ANOVA \(F\)-tests with \(k=2\) are equivalent to certain two-sample \(t\)-tests.
ANOVA serves as a gatekeeper test that determines whether any group differences exist before investigating specific pairwise comparisons.
Exercises
ANOVA Table Completion: Complete the missing entries in this ANOVA table:
Source
df
Sum Sq
Mean Sq
F value
Pr(>F)
Treatment
3
450
?
?
0.008
Error
?
?
25
Total
47
?
F-test vs t-test Connection: In a study comparing two teaching methods, 20 students are assigned to each method and measured in their understanding of the materials. It is assumed that the variances are equal in the two groups. The two-sample t-test statistic is computed as -2.4.
What would the F-statistic be for the equivalent one-way ANOVA?
What are the degrees of freedom for both tests?
How do the p-values compare between the two-sided t-test and the F-test?
Complete ANOVA Analysis: A researcher studies the effect of four different fertilizers on plant height, with the following summary data:
Fertilizer A: \(n_1 = 12, \bar{x}_1 = 18.5, s_1 = 3.2\)
Fertilizer B: \(n_2 = 15, \bar{x}_2 = 22.1, s_2 = 2.8\)
Fertilizer C: \(n_3 = 10, \bar{x}_3 = 19.8, s_3 = 3.6\)
Fertilizer D: \(n_4 = 13, \bar{x}_4 = 25.2, s_4 = 3.0\)
Check the equal variance assumption.
Set up appropriate hypotheses.
Would you expect to reject \(H_0\) based on the sample means? Explain your reasoning.
Interpretation Questions:
Explain why \(F\)-values are always non-negative but \(t\)-values can be negative.
Why do we always use
lower.tail = FALSEwhen calculating \(p\)-values for \(F\)-tests?What does it mean when MSA is much larger than MSE?
How would the \(F\)-test statistic change if all group sample means increased by the same amount?