12.3. One-Way ANOVA F-Test and Its Relationship to Two-Sample t-Tests
We’ve developed the theoretical foundation for ANOVA by decomposing total variability into between-group and within-group components. Now we can construct the actual hypothesis test that will tell us whether observed differences in sample means are statistically significant or could reasonably be attributed to random variation.
Road Map 🧭
Problem we will solve – How to construct a formal hypothesis test using the variance decomposition to compare multiple population means simultaneously
Tools we’ll learn – The F-test statistic, F-distribution properties, and the complete ANOVA testing procedure, plus the connection between F-tests and t-tests
How it fits – This transforms our variance decomposition into a practical decision-making tool while showing how ANOVA generalizes familiar two-sample procedures
12.3.1. The F-Test Statistic: Formalizing Our Intuition
We’ve established that when the null hypothesis is true, both MSA and MSE estimate the same quantity (\(\sigma^2\)). When the null hypothesis is false, MSE still estimates \(\sigma^2\), but MSA estimates something larger. This suggests we should compare these two quantities using a ratio.
The ANOVA Test Statistic
Our test statistic is:
Expanding this using our formulas:
This ratio captures exactly what we were looking for visually in our boxplots—it compares how spread out the group means are relative to the natural variability within groups.
Interpreting the F-Statistic
When :math:`H_0` is true (all population means equal):
Both MSA and MSE estimate \(\sigma^2\)
The ratio \(F_{TS}\) should be close to 1
Values much larger than 1 would be unusual
When :math:`H_0` is false (at least one mean differs):
MSE still estimates \(\sigma^2\)
MSA estimates \(\sigma^2\) plus additional variation from group differences
The ratio \(F_{TS}\) should be substantially larger than 1
Large values of \(F_{TS}\) provide evidence against the null hypothesis. But how large is “large enough”? We need to understand the sampling distribution of this statistic.
12.3.2. The F-Distribution
Under the null hypothesis and when all assumptions are satisfied, the test statistic \(F_{TS}\) follows what’s called an F-distribution. This distribution has some unique characteristics that make it perfect for ANOVA.
Properties of the F-Distribution
The F-distribution \(F(df_A, df_E)\) has two parameters—the numerator degrees of freedom (\(df_A = k-1\)) and denominator degrees of freedom (\(df_E = n-k\)).
Key properties:
Always positive: Since we’re taking a ratio of squared terms, \(F_{TS} \geq 0\)
Right-skewed: The distribution is not symmetric like the normal or t-distributions
Mean approximately 1: When \(H_0\) is true, the expected value is close to 1
Shape controlled by degrees of freedom: Both \(df_A\) and \(df_E\) affect the distribution’s shape
The shape becomes less skewed as the degrees of freedom increase, but it remains right-skewed unlike the symmetric distributions we’ve used before.
Using the F-Distribution in R
Just like other distributions, we can work with the F-distribution using R functions:
pf(): Calculates probabilities (p-values)
qf(): Finds critical values
df(): Density function (rarely used in practice)
For ANOVA, we always use lower.tail = FALSE
in the pf() function because we’re interested in the probability of observing an F-statistic as large or larger than what we calculated.
12.3.3. The Complete ANOVA Hypothesis Testing Procedure
Now we can formalize the complete four-step hypothesis testing procedure for one-way ANOVA.
Step 1: Define Parameters and Hypotheses
Parameters of interest: \(\mu_1, \mu_2, \ldots, \mu_k\) representing the population means for the \(k\) different groups.
Hypotheses:
\(H_0: \mu_1 = \mu_2 = \cdots = \mu_k\) (all population means are equal)
\(H_a:\) At least one \(\mu_i\) differs from the others
The alternative hypothesis can be written in several equivalent ways:
At least one \(\mu_i\) is different from the rest
Not all population means are equal
\(\mu_i \neq \mu_j\) for some \(i \neq j\)
Step 2: Check Assumptions
Before proceeding, verify that the ANOVA assumptions are reasonable:
Independence: Observations within and between groups are independent
Normality: Each population is normally distributed (or sample sizes are large enough for CLT)
Equal variances: All populations have the same variance \(\sigma^2\)
For the equal variance assumption, use the rule of thumb:
Step 3: Calculate Test Statistic and P-Value
Test statistic:
Degrees of freedom: - Numerator: \(df_A = k - 1\) - Denominator: \(df_E = n - k\)
P-value:
pf(F_TS, df1 = df_A, df2 = df_E, lower.tail = FALSE)
In practice, you’ll typically use R’s built-in ANOVA function:
fit <- aov(response_variable ~ factor_variable, data = dataframe)
summary(fit)
Step 4: Make Decision and State Conclusion
Decision rule:
If p-value ≤ \(\alpha\), reject \(H_0\)
If p-value > \(\alpha\), fail to reject \(H_0\)
Conclusion template:
“The data [does/does not] give [weak/moderate/strong] support (p-value = [value]) to the claim that [statement of \(H_a\) in context].”
12.3.4. The ANOVA Table
The standard way to present ANOVA results is through an ANOVA table that summarizes all the variance decomposition components:
Source |
df |
Sum of Squares |
Mean Square |
F-value |
Pr(>F) |
---|---|---|---|---|---|
Factor A |
\(k-1\) |
\(\text{SSA} = \sum_{i=1}^k n_i(\bar{x}_{i \cdot} - \bar{x}_{\cdot \cdot})^2\) |
\(\text{MSA} = \frac{\text{SSA}}{k-1}\) |
\(\frac{\text{MSA}}{\text{MSE}}\) |
p-value |
Error |
\(n-k\) |
\(\text{SSE} = \sum_{i=1}^k \sum_{j=1}^{n_i}(x_{ij} - \bar{x}_{i \cdot})^2\) |
\(\text{MSE} = \frac{\text{SSE}}{n-k}\) |
||
Total |
\(n-1\) |
\(\text{SST} = \sum_{i=1}^k \sum_{j=1}^{n_i}(x_{ij} - \bar{x}_{\cdot \cdot})^2\) |
\(\text{MST} = \frac{\text{SST}}{n-1}\) |
The total row is often omitted from R output but can be calculated since SST = SSA + SSE and the degrees of freedom also add up.
Estimating the common standard deviation: \(s = \sqrt{\text{MSE}}\)
12.3.5. Coffeehouse Example: Complete Analysis
Let’s work through our coffeehouse example using the complete ANOVA procedure to see how everything fits together.
Setting Up the Problem
Research question: Do the five coffeehouses around campus attract customers of different average ages?
Study design: A reporter surveys approximately 50 random customers at each coffeehouse, asking for their age.
Data summary (from previous analysis):
Total sample size: \(n = 200\)
Number of groups: \(k = 5\)
Group sample sizes: \(n_1 = 39, n_2 = 38, n_3 = 42, n_4 = 38, n_5 = 43\)
Sample means: \(\bar{x}_1 = 39.13, \bar{x}_2 = 46.66, \bar{x}_3 = 40.50, \bar{x}_4 = 26.42, \bar{x}_5 = 34.07\)
Sample variances: \(s_1^2 = 7.90, s_2^2 = 12.97, s_3^2 = 10.94, s_4^2 = 6.99, s_5^2 = 9.92\)
Step 1: Parameters and Hypotheses
Parameters: \(\mu_{\text{Age}_1}, \mu_{\text{Age}_2}, \mu_{\text{Age}_3}, \mu_{\text{Age}_4}, \mu_{\text{Age}_5}\) representing the mean customer age at coffeehouses 1, 2, 3, 4, and 5 respectively.
Hypotheses:
\(H_0: \mu_{\text{Age}_1} = \mu_{\text{Age}_2} = \mu_{\text{Age}_3} = \mu_{\text{Age}_4} = \mu_{\text{Age}_5}\)
\(H_a: \mu_{\text{Age}_i} \neq \mu_{\text{Age}_j}\) for some \(i \neq j\)
Step 2: Check Assumptions
Equal variance check:
Since 1.36 < 2, the equal variance assumption appears reasonable.
Normality: Visual inspection of histograms for each group shows approximate normality with no extreme deviations. With sample sizes around 40 in each group, the Central Limit Theorem helps ensure the sampling distribution of means is approximately normal.
Step 3: Calculate Test Statistic and P-Value
Degrees of freedom: - \(df_A = k - 1 = 5 - 1 = 4\) - \(df_E = n - k = 200 - 5 = 195\)
Using R’s ANOVA function on the full dataset:
fit <- aov(Age ~ Coffeehouse, data = coffeehouse_df)
summary(fit)
ANOVA Table Results:
Source |
df |
Sum Sq |
Mean Sq |
F value |
Pr(>F) |
---|---|---|---|---|---|
Coffeehouse |
4 |
8834 |
2208.4 |
22.14 |
4.4e-15 |
Residuals |
195 |
19451 |
99.8 |
Test statistic: \(F_{TS} = 22.14\)
P-value: \(4.4 \times 10^{-15}\) (extremely small)
Step 4: Decision and Conclusion
Using \(\alpha = 0.01\):
Decision: Since p-value = \(4.4 \times 10^{-15} < 0.01 = \alpha\), we reject \(H_0\).
Conclusion: The data gives strong support (p-value = \(4.4 \times 10^{-15}\)) to the claim that at least one of the coffee shops around campus differs in the mean age of customers from the rest.
The F-statistic of 22.14 is much larger than 1, indicating that the between-group variability is more than 22 times the within-group variability—strong evidence that the coffeehouses attract customers of systematically different ages.
12.3.6. The Connection Between F-Tests and t-Tests
An important theoretical connection exists between ANOVA and the two-sample t-test. When we have exactly two groups (\(k = 2\)), the one-way ANOVA F-test is mathematically equivalent to the two-sample t-test with equal variance assumption.
Relationship When k = 2
For two groups, our F-statistic becomes:
Through algebraic manipulation (which involves expressing the overall mean \(\bar{X}_{\cdot \cdot}\) as a weighted average of the group means), this simplifies to:
where \(t_{TS}\) is the two-sample t-statistic with pooled variance:
This means F-statistic = (t-statistic)² when comparing two groups.
Comparing F-Tests and t-Tests
Feature |
Two-Sample t-Test |
One-Way ANOVA |
---|---|---|
Variance Assumption |
Can assume equal or unequal variances |
Assumes equal variances |
Alternative Hypothesis |
Directional (>, <) or non-directional (≠) |
Non-directional only |
Distribution |
t-distribution (symmetric) |
F-distribution (right-skewed) |
Null Value |
Can test \(\mu_1 - \mu_2 = \Delta_0\) |
Tests only \(\mu_1 = \mu_2\) (no null value) |
Number of Groups |
Exactly 2 groups |
2 or more groups |
When to Use Each Test
Use two-sample t-test when: - You have exactly 2 groups - You want to test directional alternatives (one-sided tests) - You prefer not to assume equal variances (Welch’s t-test) - You want to test a specific difference (\(\mu_1 - \mu_2 = \Delta_0\))
Use one-way ANOVA when: - You have 3 or more groups - You want to test for any differences among groups - You’re willing to assume equal variances - You want an overall test before looking at specific comparisons
The beauty of this connection is that it shows ANOVA as a natural generalization of the two-sample procedures we’ve already mastered.
12.3.7. What Happens After Rejecting H₀?
When ANOVA indicates that “at least one mean differs from the others,” this naturally leads to the question: “Which specific groups are different?” ANOVA doesn’t tell us which means differ—only that they’re not all equal.
This limitation leads us to multiple comparison procedures, which we’ll explore in the next section. These methods allow us to make specific pairwise comparisons while controlling the overall error rate.
For now, it’s important to understand that ANOVA serves as a “gatekeeper” test. If we fail to reject the null hypothesis in ANOVA, we typically stop there and conclude there’s no evidence for group differences. If we do reject the null hypothesis, then we proceed to investigate which specific groups differ.
12.3.8. Bringing It All Together
Key Takeaways 📝
The F-test statistic \(\frac{\text{MSA}}{\text{MSE}}\) compares between-group to within-group variability, with large values providing evidence against \(H_0\).
The F-distribution is right-skewed, always positive, with mean approximately 1 when \(H_0\) is true, and shape controlled by two degrees of freedom parameters.
The complete ANOVA procedure follows the standard four-step hypothesis testing framework, with R’s
aov()
function providing convenient computation.The ANOVA table organizes all variance decomposition components and provides the F-statistic and p-value for decision making.
F-tests and t-tests are equivalent when comparing exactly two groups: F-statistic = (t-statistic)² under equal variance assumptions.
ANOVA serves as a gatekeeper test that determines whether any group differences exist before investigating specific pairwise comparisons.
Practical significance depends not just on statistical significance but also on the magnitude of differences relative to within-group variability.
Exercises
F-Distribution Properties: An ANOVA comparing 4 groups with 60 total observations yields an F-statistic of 2.8.
What are the degrees of freedom for this F-statistic?
Would you expect this F-value to be statistically significant at \(\alpha = 0.05\)? Why?
How would the shape of this F-distribution compare to F(1,56)?
ANOVA Table Completion: Complete the missing entries in this ANOVA table:
Source
df
Sum Sq
Mean Sq
F value
Pr(>F)
Treatment
3
450
?
?
0.008
Error
?
?
25
Total
47
?
?
F-test vs t-test Connection: In a study comparing two teaching methods with 20 students each, the two-sample t-statistic (equal variance) is -2.4.
What would the F-statistic be for the equivalent one-way ANOVA?
What are the degrees of freedom for both tests?
How do the p-values compare between the two-sided t-test and the F-test?
Complete ANOVA Analysis: A researcher studies the effect of four different fertilizers on plant height, with the following summary data:
Fertilizer A: \(n_1 = 12, \bar{x}_1 = 18.5, s_1 = 3.2\)
Fertilizer B: \(n_2 = 15, \bar{x}_2 = 22.1, s_2 = 2.8\)
Fertilizer C: \(n_3 = 10, \bar{x}_3 = 19.8, s_3 = 3.6\)
Fertilizer D: \(n_4 = 13, \bar{x}_4 = 25.2, s_4 = 3.0\)
Check the equal variance assumption
Set up appropriate hypotheses
Calculate the overall sample mean
Would you expect to reject \(H_0\) based on the sample means? Explain your reasoning.
Interpretation Questions:
Explain why F-values are always non-negative but t-values can be negative
Why do we always use
lower.tail = FALSE
when calculating p-values for F-tests?What does it mean when MSA is much larger than MSE?
How would the F-statistic change if all group means increased by the same amount?