12.3. One-Way ANOVA F-Test and Its Relationship to Two-Sample t-Tests

We’ve developed the theoretical foundation for ANOVA by decomposing total variability into between-group and within-group components. Now we can construct the actual hypothesis test that will tell us whether observed differences in sample means are statistically significant or could reasonably be attributed to random variation.

Road Map 🧭

  • Problem we will solve – How to construct a formal hypothesis test using the variance decomposition to compare multiple population means simultaneously

  • Tools we’ll learn – The F-test statistic, F-distribution properties, and the complete ANOVA testing procedure, plus the connection between F-tests and t-tests

  • How it fits – This transforms our variance decomposition into a practical decision-making tool while showing how ANOVA generalizes familiar two-sample procedures

12.3.1. The F-Test Statistic: Formalizing Our Intuition

We’ve established that when the null hypothesis is true, both MSA and MSE estimate the same quantity (\(\sigma^2\)). When the null hypothesis is false, MSE still estimates \(\sigma^2\), but MSA estimates something larger. This suggests we should compare these two quantities using a ratio.

The ANOVA Test Statistic

Our test statistic is:

\[F_{TS} = \frac{\text{MSA}}{\text{MSE}} = \frac{\text{Between-group variability}}{\text{Within-group variability}}\]

Expanding this using our formulas:

\[F_{TS} = \frac{\frac{1}{k-1}\sum_{i=1}^k n_i(\bar{X}_{i \cdot} - \bar{X}_{\cdot \cdot})^2}{\frac{1}{n-k}\sum_{i=1}^k (n_i - 1)s_i^2}\]

This ratio captures exactly what we were looking for visually in our boxplots—it compares how spread out the group means are relative to the natural variability within groups.

Interpreting the F-Statistic

When :math:`H_0` is true (all population means equal):

  • Both MSA and MSE estimate \(\sigma^2\)

  • The ratio \(F_{TS}\) should be close to 1

  • Values much larger than 1 would be unusual

When :math:`H_0` is false (at least one mean differs):

  • MSE still estimates \(\sigma^2\)

  • MSA estimates \(\sigma^2\) plus additional variation from group differences

  • The ratio \(F_{TS}\) should be substantially larger than 1

Large values of \(F_{TS}\) provide evidence against the null hypothesis. But how large is “large enough”? We need to understand the sampling distribution of this statistic.

12.3.2. The F-Distribution

Under the null hypothesis and when all assumptions are satisfied, the test statistic \(F_{TS}\) follows what’s called an F-distribution. This distribution has some unique characteristics that make it perfect for ANOVA.

Properties of the F-Distribution

The F-distribution \(F(df_A, df_E)\) has two parameters—the numerator degrees of freedom (\(df_A = k-1\)) and denominator degrees of freedom (\(df_E = n-k\)).

Key properties:

  1. Always positive: Since we’re taking a ratio of squared terms, \(F_{TS} \geq 0\)

  2. Right-skewed: The distribution is not symmetric like the normal or t-distributions

  3. Mean approximately 1: When \(H_0\) is true, the expected value is close to 1

  4. Shape controlled by degrees of freedom: Both \(df_A\) and \(df_E\) affect the distribution’s shape

The shape becomes less skewed as the degrees of freedom increase, but it remains right-skewed unlike the symmetric distributions we’ve used before.

Using the F-Distribution in R

Just like other distributions, we can work with the F-distribution using R functions:

  • pf(): Calculates probabilities (p-values)

  • qf(): Finds critical values

  • df(): Density function (rarely used in practice)

For ANOVA, we always use lower.tail = FALSE in the pf() function because we’re interested in the probability of observing an F-statistic as large or larger than what we calculated.

12.3.3. The Complete ANOVA Hypothesis Testing Procedure

Now we can formalize the complete four-step hypothesis testing procedure for one-way ANOVA.

Step 1: Define Parameters and Hypotheses

Parameters of interest: \(\mu_1, \mu_2, \ldots, \mu_k\) representing the population means for the \(k\) different groups.

Hypotheses:

\(H_0: \mu_1 = \mu_2 = \cdots = \mu_k\) (all population means are equal)

\(H_a:\) At least one \(\mu_i\) differs from the others

The alternative hypothesis can be written in several equivalent ways:

  • At least one \(\mu_i\) is different from the rest

  • Not all population means are equal

  • \(\mu_i \neq \mu_j\) for some \(i \neq j\)

Step 2: Check Assumptions

Before proceeding, verify that the ANOVA assumptions are reasonable:

  1. Independence: Observations within and between groups are independent

  2. Normality: Each population is normally distributed (or sample sizes are large enough for CLT)

  3. Equal variances: All populations have the same variance \(\sigma^2\)

For the equal variance assumption, use the rule of thumb:

\[\frac{\max(s_1, s_2, \ldots, s_k)}{\min(s_1, s_2, \ldots, s_k)} \leq 2\]

Step 3: Calculate Test Statistic and P-Value

Test statistic:

\[F_{TS} = \frac{\text{MSA}}{\text{MSE}}\]

Degrees of freedom: - Numerator: \(df_A = k - 1\) - Denominator: \(df_E = n - k\)

P-value:

pf(F_TS, df1 = df_A, df2 = df_E, lower.tail = FALSE)

In practice, you’ll typically use R’s built-in ANOVA function:

fit <- aov(response_variable ~ factor_variable, data = dataframe)
summary(fit)

Step 4: Make Decision and State Conclusion

Decision rule:

  • If p-value ≤ \(\alpha\), reject \(H_0\)

  • If p-value > \(\alpha\), fail to reject \(H_0\)

Conclusion template:

“The data [does/does not] give [weak/moderate/strong] support (p-value = [value]) to the claim that [statement of \(H_a\) in context].”

12.3.4. The ANOVA Table

The standard way to present ANOVA results is through an ANOVA table that summarizes all the variance decomposition components:

Table 12.1 ANOVA Table Format

Source

df

Sum of Squares

Mean Square

F-value

Pr(>F)

Factor A

\(k-1\)

\(\text{SSA} = \sum_{i=1}^k n_i(\bar{x}_{i \cdot} - \bar{x}_{\cdot \cdot})^2\)

\(\text{MSA} = \frac{\text{SSA}}{k-1}\)

\(\frac{\text{MSA}}{\text{MSE}}\)

p-value

Error

\(n-k\)

\(\text{SSE} = \sum_{i=1}^k \sum_{j=1}^{n_i}(x_{ij} - \bar{x}_{i \cdot})^2\)

\(\text{MSE} = \frac{\text{SSE}}{n-k}\)

Total

\(n-1\)

\(\text{SST} = \sum_{i=1}^k \sum_{j=1}^{n_i}(x_{ij} - \bar{x}_{\cdot \cdot})^2\)

\(\text{MST} = \frac{\text{SST}}{n-1}\)

The total row is often omitted from R output but can be calculated since SST = SSA + SSE and the degrees of freedom also add up.

Estimating the common standard deviation: \(s = \sqrt{\text{MSE}}\)

12.3.5. Coffeehouse Example: Complete Analysis

Let’s work through our coffeehouse example using the complete ANOVA procedure to see how everything fits together.

Setting Up the Problem

Research question: Do the five coffeehouses around campus attract customers of different average ages?

Study design: A reporter surveys approximately 50 random customers at each coffeehouse, asking for their age.

Data summary (from previous analysis):

  • Total sample size: \(n = 200\)

  • Number of groups: \(k = 5\)

  • Group sample sizes: \(n_1 = 39, n_2 = 38, n_3 = 42, n_4 = 38, n_5 = 43\)

  • Sample means: \(\bar{x}_1 = 39.13, \bar{x}_2 = 46.66, \bar{x}_3 = 40.50, \bar{x}_4 = 26.42, \bar{x}_5 = 34.07\)

  • Sample variances: \(s_1^2 = 7.90, s_2^2 = 12.97, s_3^2 = 10.94, s_4^2 = 6.99, s_5^2 = 9.92\)

Step 1: Parameters and Hypotheses

Parameters: \(\mu_{\text{Age}_1}, \mu_{\text{Age}_2}, \mu_{\text{Age}_3}, \mu_{\text{Age}_4}, \mu_{\text{Age}_5}\) representing the mean customer age at coffeehouses 1, 2, 3, 4, and 5 respectively.

Hypotheses:

\(H_0: \mu_{\text{Age}_1} = \mu_{\text{Age}_2} = \mu_{\text{Age}_3} = \mu_{\text{Age}_4} = \mu_{\text{Age}_5}\)

\(H_a: \mu_{\text{Age}_i} \neq \mu_{\text{Age}_j}\) for some \(i \neq j\)

Step 2: Check Assumptions

Equal variance check:

\[\frac{\max(s_i)}{\min(s_i)} = \frac{\sqrt{12.97}}{\sqrt{6.99}} = \frac{3.60}{2.64} = 1.36 \leq 2 ✓\]

Since 1.36 < 2, the equal variance assumption appears reasonable.

Normality: Visual inspection of histograms for each group shows approximate normality with no extreme deviations. With sample sizes around 40 in each group, the Central Limit Theorem helps ensure the sampling distribution of means is approximately normal.

Step 3: Calculate Test Statistic and P-Value

Degrees of freedom: - \(df_A = k - 1 = 5 - 1 = 4\) - \(df_E = n - k = 200 - 5 = 195\)

Using R’s ANOVA function on the full dataset:

fit <- aov(Age ~ Coffeehouse, data = coffeehouse_df)
summary(fit)

ANOVA Table Results:

Table 12.2 Coffeehouse ANOVA Results

Source

df

Sum Sq

Mean Sq

F value

Pr(>F)

Coffeehouse

4

8834

2208.4

22.14

4.4e-15

Residuals

195

19451

99.8

Test statistic: \(F_{TS} = 22.14\)

P-value: \(4.4 \times 10^{-15}\) (extremely small)

Step 4: Decision and Conclusion

Using \(\alpha = 0.01\):

Decision: Since p-value = \(4.4 \times 10^{-15} < 0.01 = \alpha\), we reject \(H_0\).

Conclusion: The data gives strong support (p-value = \(4.4 \times 10^{-15}\)) to the claim that at least one of the coffee shops around campus differs in the mean age of customers from the rest.

The F-statistic of 22.14 is much larger than 1, indicating that the between-group variability is more than 22 times the within-group variability—strong evidence that the coffeehouses attract customers of systematically different ages.

12.3.6. The Connection Between F-Tests and t-Tests

An important theoretical connection exists between ANOVA and the two-sample t-test. When we have exactly two groups (\(k = 2\)), the one-way ANOVA F-test is mathematically equivalent to the two-sample t-test with equal variance assumption.

Relationship When k = 2

For two groups, our F-statistic becomes:

\[F_{TS} = \frac{\text{MSA}}{\text{MSE}} = \frac{n_1(\bar{X}_1 - \bar{X}_{\cdot \cdot})^2 + n_2(\bar{X}_2 - \bar{X}_{\cdot \cdot})^2}{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}\]

Through algebraic manipulation (which involves expressing the overall mean \(\bar{X}_{\cdot \cdot}\) as a weighted average of the group means), this simplifies to:

\[F_{TS} = \frac{(\bar{X}_1 - \bar{X}_2)^2}{s_p^2\left(\frac{1}{n_1} + \frac{1}{n_2}\right)} = t_{TS}^2\]

where \(t_{TS}\) is the two-sample t-statistic with pooled variance:

\[t_{TS} = \frac{\bar{X}_1 - \bar{X}_2}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\]

This means F-statistic = (t-statistic)² when comparing two groups.

Comparing F-Tests and t-Tests

Table 12.3 F-test vs t-test Comparison

Feature

Two-Sample t-Test

One-Way ANOVA

Variance Assumption

Can assume equal or unequal variances

Assumes equal variances

Alternative Hypothesis

Directional (>, <) or non-directional (≠)

Non-directional only

Distribution

t-distribution (symmetric)

F-distribution (right-skewed)

Null Value

Can test \(\mu_1 - \mu_2 = \Delta_0\)

Tests only \(\mu_1 = \mu_2\) (no null value)

Number of Groups

Exactly 2 groups

2 or more groups

When to Use Each Test

Use two-sample t-test when: - You have exactly 2 groups - You want to test directional alternatives (one-sided tests) - You prefer not to assume equal variances (Welch’s t-test) - You want to test a specific difference (\(\mu_1 - \mu_2 = \Delta_0\))

Use one-way ANOVA when: - You have 3 or more groups - You want to test for any differences among groups - You’re willing to assume equal variances - You want an overall test before looking at specific comparisons

The beauty of this connection is that it shows ANOVA as a natural generalization of the two-sample procedures we’ve already mastered.

12.3.7. What Happens After Rejecting H₀?

When ANOVA indicates that “at least one mean differs from the others,” this naturally leads to the question: “Which specific groups are different?” ANOVA doesn’t tell us which means differ—only that they’re not all equal.

This limitation leads us to multiple comparison procedures, which we’ll explore in the next section. These methods allow us to make specific pairwise comparisons while controlling the overall error rate.

For now, it’s important to understand that ANOVA serves as a “gatekeeper” test. If we fail to reject the null hypothesis in ANOVA, we typically stop there and conclude there’s no evidence for group differences. If we do reject the null hypothesis, then we proceed to investigate which specific groups differ.

12.3.8. Bringing It All Together

Key Takeaways 📝

  1. The F-test statistic \(\frac{\text{MSA}}{\text{MSE}}\) compares between-group to within-group variability, with large values providing evidence against \(H_0\).

  2. The F-distribution is right-skewed, always positive, with mean approximately 1 when \(H_0\) is true, and shape controlled by two degrees of freedom parameters.

  3. The complete ANOVA procedure follows the standard four-step hypothesis testing framework, with R’s aov() function providing convenient computation.

  4. The ANOVA table organizes all variance decomposition components and provides the F-statistic and p-value for decision making.

  5. F-tests and t-tests are equivalent when comparing exactly two groups: F-statistic = (t-statistic)² under equal variance assumptions.

  6. ANOVA serves as a gatekeeper test that determines whether any group differences exist before investigating specific pairwise comparisons.

  7. Practical significance depends not just on statistical significance but also on the magnitude of differences relative to within-group variability.

Exercises

  1. F-Distribution Properties: An ANOVA comparing 4 groups with 60 total observations yields an F-statistic of 2.8.

    1. What are the degrees of freedom for this F-statistic?

    2. Would you expect this F-value to be statistically significant at \(\alpha = 0.05\)? Why?

    3. How would the shape of this F-distribution compare to F(1,56)?

  2. ANOVA Table Completion: Complete the missing entries in this ANOVA table:

    Source

    df

    Sum Sq

    Mean Sq

    F value

    Pr(>F)

    Treatment

    3

    450

    ?

    ?

    0.008

    Error

    ?

    ?

    25

    Total

    47

    ?

    ?

  3. F-test vs t-test Connection: In a study comparing two teaching methods with 20 students each, the two-sample t-statistic (equal variance) is -2.4.

    1. What would the F-statistic be for the equivalent one-way ANOVA?

    2. What are the degrees of freedom for both tests?

    3. How do the p-values compare between the two-sided t-test and the F-test?

  4. Complete ANOVA Analysis: A researcher studies the effect of four different fertilizers on plant height, with the following summary data:

    • Fertilizer A: \(n_1 = 12, \bar{x}_1 = 18.5, s_1 = 3.2\)

    • Fertilizer B: \(n_2 = 15, \bar{x}_2 = 22.1, s_2 = 2.8\)

    • Fertilizer C: \(n_3 = 10, \bar{x}_3 = 19.8, s_3 = 3.6\)

    • Fertilizer D: \(n_4 = 13, \bar{x}_4 = 25.2, s_4 = 3.0\)

    1. Check the equal variance assumption

    2. Set up appropriate hypotheses

    3. Calculate the overall sample mean

    4. Would you expect to reject \(H_0\) based on the sample means? Explain your reasoning.

  5. Interpretation Questions:

    1. Explain why F-values are always non-negative but t-values can be negative

    2. Why do we always use lower.tail = FALSE when calculating p-values for F-tests?

    3. What does it mean when MSA is much larger than MSE?

    4. How would the F-statistic change if all group means increased by the same amount?