Slides 📊

12.3. ANOVA F-Test and Its Relationship to Two-Sample t-Tests

We have developed the theoretical foundation for ANOVA by decomposing total variability into between-group and within-group components. Now we are ready to construct the hypothesis test that will tell us whether observed differences in sample means are statistically significant.

Road Map 🧭

Understand why the \(F\)-test statistic serves as an indicator of differences among population means.
Describe the properties of the \(F\) distributions.
Construct a complete ANOVA table and perform an ANOVA \(F\)-test using the four-step framework.
Recognize the connections between ANOVA and independent two-sample inference.

12.3.1. Building the Test Statistic

Recall the goal of the ANOVA hypothesis test: we would like to compare the variabilities within and between groups, and reject the null hypothesis that all means are equal if the between-group variability is significantly larger.

Comparison of within-group and between-group variation — Fig. 12.7 Within-group vs between-group variability

To formalize this comparison, we use the ratio between MSA and MSE.

\[\frac{\text{Between-group variability}}{\text{Within-group variability}} = \frac{\text{MSA}}{\text{MSE}}\]

When \(H_0\) is true, both MSA and MSE estimate \(\sigma^2\), so their observed ratio tends to be close to 1. When \(H_0\) is false, however, MSA estimates something larger, making it more likely for the ratio to take a value significantly larger than 1.

Under the null hypothesis, the distribution of the ratio belongs to the family of \(F\)-distributions. For this reason, the ratio is called the \(F\)-test statistic, or \(F_{TS}\). To complete the hypothesis testing construction, we next review the main properties of \(F\)-distributed random variables.

12.3.2. The \(F\)-Distribution

\(F\)-distributions are parameterized by two degrees of freedom: \(df_1\) and \(df_2\). When a ramdom variable \(X\) follows an \(F\)-distribution, we write:

\[X \sim F(df_1, df_2).\]

\(F\)-distributions are always supported on \([0, \infty)\) and are right-skewed regardless of the parameter values. As the two degrees of freedom grow,

the skewness weakens (see the yellow curve in Fig. 12.8),
the expected value quickly approaches 1, and
the variance decreases.

F-distributions with different sets of parameter values — Fig. 12.8 \(F\)-distribution with different sets of parameter values

Let us now discuss the specific \(F\) distribution of the ANOVA test statistic. Under the null hypothesis,

\[F_{TS} \sim F(df_A = k-1, df_E = n-k),\]

where \(k\) represents the number of groups and \(n\) the total sample size.

Drawing connections with the general properties of \(F\)-distributions,

\(F_{TS}\) will always yield a non-negative outcome since it is a ratio of two non-negative random variables. This agrees with the support of its null distribution.
As the total sample size \(n\) grows, the expected value of \(F_{TS}\) grows closer to 1 and its spread becomes narrower arround the mean.

The \(p\)-Value for ANOVA

Regardless of the analysis method, a \(p\)-value always represents the probability of obtaining a result more inconsistent with the null hypothesis than the one observed. In ANOVA, such inconsistency corresponds to a greater observed \(F\)-test statistic. Therefore,

\[\text{p-value} = P(F_{k-1,n-k} \geq f_{ts}),\]

where \(F_{k-1,n-k}\) is a random variable following an \(F\) distribution with \((df_1,df_2)=(k-1, n-k)\), and \(f_{TS}\) is the observed \(F\)-test statistic. On R, the \(p\)-value can be obtained using the pf function:

pvalue <- pf(f_ts, df1=k-1, df2=n-k, lower.tail=FALSE)

The Complete ANOVA Table

We are now fully equipped to construct the complete ANOVA table that we left partially filled in Chapter 12.2. The entries marked with ⅹ are typically left blank.

Source	df	SS	MS	\((f_{TS})\)	\(p\)-value
Factor A	\(k-1\)	\(\sum_{i=1}^k n_i(\bar{x}_{i \cdot} - \bar{x}_{\cdot \cdot})^2\)	\(\frac{\text{SSA}}{k-1}\)	\(\frac{MSA}{MSE}\)	\(P(F_{k-1,n-k} \geq f_{ts})\)
Error	\(n-k\)	\(\sum_{i=1}^k \sum_{j=1}^{n_i}(x_{ij} - \bar{x}_{i \cdot})^2\)	\(\frac{\text{SSE}}{n-k}\)	ⅹ	ⅹ
Total	\(n-1\)	\(\sum_{i=1}^k \sum_{j=1}^{n_i}(x_{ij} - \bar{x}_{\cdot \cdot})^2\)	ⅹ	ⅹ	ⅹ

Example 💡: The Complete ANOVA Table for the Coffeehouse Study

For the cofeehouse study, complete the remaining entries of the ANOVA table.

Fig. 12.9 Partial ANOVA table for the coffeehouse example

We only need to fill the two entries corresponding to the observed \(f_{TS}\) and the \(p\)-value.

First,

\[f_{TS} = \frac{MSA}{MSE} = \frac{2208.4}{99.8} = 22.14.\]

Recall that under the null hypothesis and a total sample size as large as \(n=200\), the \(F\)-test statistic is distributed sharply around 1. The observed value of 22.14 already gives a strong sign of inconsistency with the null hypothesis.

Let us continue to computing the \(p\)-value and check if our prediction is confirmed:

\[\text{p-value} = P(F_{4,195} \geq 22.14) = 4.4 \times 10^{15}.\]

As expected, the \(p\)-value is very small.

We are now ready to organize the ANOVA hypothesis test into a full four-step framework.

12.3.3. The Four Steps of ANOVA Hypothesis Testing

Step 1: Define Parameters

Define the population mean \(\mu_i\), for each \(i \in \{1, \cdots, k\}\). The definition should clearly describe the populations of interest and connect each \(\mu_i\) to a specific population.

Step 2: State the Hypotheses

\[\begin{split}&H_0: \mu_1 = \mu_2 = \cdots = \mu_k \\ &H_a: \mu_i \neq \mu_j \text{ for some } i \neq j\end{split}\]

The alternative hypothesis can be written in several equivalent ways:

\(H_a:\) At least one \(\mu_i\) is different from the rest
\(H_a:\) Not all population means are equal
\(H_a:\) At least one \(\mu_i\) differs from the others

Step 3-1: Check Assumptions

Before proceeding, we must verify that the ANOVA assumptions are reasonable. See Section 12.1.3 for a complete walkthrough of graphical verification. In addition, we must confirm numerically that the equal variance assumption is reasonable by showing:

\[\frac{\max(s_{1\cdot}, s_{2\cdot}, \ldots, s_{k\cdot})}{\min(s_{1\cdot}, s_{2\cdot}, \ldots, s_{k\cdot})} \leq 2.\]

Step 3-2: Calculate the Test Statistic, Degrees of Freedom, and \(p\)-Value

In this step, it is often helpful to first construct the full ANOVA table.

Use the computed MSA and MSE for the observed test statistic, \(f_{TS}\):

\[f_{TS} = \frac{\text{MSA}}{\text{MSE}}\]
State the degrees of freedom:
- \(df_A = k - 1\)
- \(df_E = n - k\)
Compute the \(p\)-value, \(P(F_{df_A, df_E} \geq f_{TS})\):

pf(f_ts, df1 = k-1, df2 = n-k, lower.tail = FALSE)

Step 4: Make Decision and State Conclusion

Decision rule stays unchanged:

If \(p\)-value \(\leq \alpha\), reject \(H_0\).
If \(p\)-value \(> \alpha\), fail to reject \(H_0\).

Conclusion template:

“The data [does/does not] give [weak/moderate/strong] support (p-value = [value]) to the claim that [statement of \(H_a\) in context].”

What Happens If \(H_0\) Is Rejected?

When ANOVA indicates that “at least one mean differs from the others,” it naturally raises the next question: “Which specific groups are different?” This brings us to multiple comparison procedures, explored in the next section. These methods allow specific pairwise comparisons while controlling the overall error rate.

For now, it’s important to understand that ANOVA serves as a gatekeeper test, screening data sets that require pairwise comparisons from those that do not.

Example💡: Complete ANOVA Testing for the Cofeehouse Data ☕️

Perform a hypothesis test at \(\alpha = 0.01\) to determine if the five coffeehouses around campus attract customers of different average ages.

https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter12/coffeehouse-summary.png — Fig. 12.10 Coffeehouse example summary

Step 1: Define the Parameters

Let \(\mu_{1}, \mu_{2}, \mu_{3}, \mu_{4}, \mu_{5}\) represent the true mean customer age at coffeehouses 1, 2, 3, 4, and 5, respectively.

Step 2: State the Hypotheses:

\[\begin{split}&H_0: \mu_{1} = \mu_{2} = \mu_{3} = \mu_{4} = \mu_{5}\\ &H_a: \mu_{i} \neq \mu_{j} \text{ for some } i \neq j\end{split}\]

Step 3-1: Check Assumptions

This step should include all the following elements:

Graphical check for any serious deviations from normality in individual samples
Graphical check for any signs of violation of the equal variance assumption
Using the numerical method to confirm that the sample variances (standard deviations) are within similar ranges.

Refer to the last example of Chapter 12.1.

Step 3-2: Calculate the Test Statistic, Degrees of Freedom, and p-Value

Full ANOVA table for the coffeehouse example — Fig. 12.11 Complete ANOVA table for the coffeehouse example

Test statistic: \(f_{TS} = 22.14\)
Degrees of freedom for the null distribution: \(df_A = 4\), \(df_E=195\)
\(p\)-value \(=4.4 \times 10^{-15}\)

Step 4: Decision and Conclusion

Since p-value = \(4.4 \times 10^{-15} < 0.01 = \alpha\), we reject \(H_0\). The data gives strong support (p-value = \(4.4 \times 10^{-15}\)) to the claim that at least one of the coffee shops around campus differs in the mean age of customers from the rest.

12.3.4. The Connection Between F-Tests and t-Tests

It is possible to view one-way ANOVA as a generalization of independent two-sample analysis under certain conditions. Specifically, one-way ANOVA with \(k=2\) is equivalent to a two-tailed independent two-sample hypothesis test with \(\Delta_0=0\) and the equal variance assumption.

We show this special relationship by demonstrating that the \(F\)-test statistic for ANOVA is equal to the square of the \(t\)-test statistic for the two-sammple comparison. In turn, we also show that the \(p\)-values computed from these two statistics are identical.

Connection Between the Test Statistics

When \(k=2\), the ANOVA \(F\)-test statistic is:

\[F_{TS} = \frac{\text{MSA}}{\text{MSE}} = \frac{n_1(\bar{X}_{1\cdot} - \bar{X}_{\cdot \cdot})^2 + n_2(\bar{X}_{2\cdot} - \bar{X}_{\cdot \cdot})^2}{\frac{(n_1-1)S_{1\cdot}^2 + (n_2-1)S_{2\cdot}^2}{n_1 + n_2 - 2}}.\]

Through algebraic manipulation (which involves expressing the overall mean \(\bar{X}_{\cdot \cdot}\) as a weighted average of the group means), this simplifies to:

\[F_{TS} = \frac{(\bar{X}_{1\cdot} - \bar{X}_{2\cdot})^2}{S_p^2\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}.\]

Recall the \(t\)-test statistic for independent two-sample comparison with the pooled variance estimator:

\[T_{TS} = \frac{(\bar{X}_1 - \bar{X}_2) - \Delta_0}{S_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}.\]

By setting \(\Delta_0 = 0\) and squaring \(T_{TS}\), we recover \(F_{TS}\).

Equivalence of the \(p\)-Values

Using the connection between the two test statistics, the ANOVA \(p\)-value satisfies:

\[P(F_{1, n-2} > f_{TS}) = P(T_{n-2}^2 > t^2_{TS}) = P(|T_{n-2}| > |t_{TS}|) = 2P(T_{n-2} > |t_{TS}|)\]

The final probability statement is, in fact, the \(p\)-value for the two-sided \(t\)-test. That is, the \(p\)-value computed through one-way ANOVA is identical to the \(p\)-value computed from a two-sided test for difference between two means.

It follows that the decision to reject or fail to reject the null hypothesis is also identical—essentially, the two tests are the same procedure in difference forms.

Summary

In summary, ANOVA with \(k=2\) is equivalent to an independent two-sample analysis with the equal variance assumption and a null value of zero. Between the two options, then, what should we choose? The decision depends on whether you value the flexibility of the two-sample analysis or the generalizability of ANOVA.

Comparison of Independent Two-Sample \(t\)-Test and ANOVA
Feature	Indepdent Two-Sample \(t\)-Test	One-Way ANOVA
Variance Assumption	Can assume equal or unequal variances among groups	Assumes equal variances
Hypothesis Type	Any direction can be chosen	Two-sided only
Null Value \(\Delta_0\)	Can by any value	Limited to \(\Delta_0 = 0\)
Number of Groups	Exactly 2 groups	2 or more groups

12.3.5. Bringing It All Together

Key Takeaways 📝

The F-test statistic \(\frac{\text{MSA}}{\text{MSE}}\) compares between-group to within-group variability, with large values providing evidence against \(H_0\).
The F-distribution is right-skewed, non-negative, and with mean approximately 1.s Its shape is controlled by two degrees of freedom. Under the null hypothesis, the \(F\)-test statistic follows an \(F\) distribution with \(df_1=k-1\) and \(df_2=n-k\).
The ANOVA table organizes all components of ANOVA, including the \(F\)-test statistic and the \(p\)-value.
The complete ANOVA hypothesis testing follows the standard four-step framework.
ANOVA \(F\)-tests with \(k=2\) are equivalent to certain two-sample \(t\)-tests.
ANOVA serves as a gatekeeper test that determines whether any group differences exist before investigating specific pairwise comparisons.

Exercises

ANOVA Table Completion: Complete the missing entries in this ANOVA table:

Source

df

Sum Sq

Mean Sq

F value

Pr(>F)

Treatment

3

450

?

?

0.008

Error

?

?

25

Total

47

?
F-test vs t-test Connection: In a study comparing two teaching methods, 20 students are assigned to each method and measured in their understanding of the materials. It is assumed that the variances are equal in the two groups. The two-sample t-test statistic is computed as -2.4.
1. What would the F-statistic be for the equivalent one-way ANOVA?
2. What are the degrees of freedom for both tests?
3. How do the p-values compare between the two-sided t-test and the F-test?
Complete ANOVA Analysis: A researcher studies the effect of four different fertilizers on plant height, with the following summary data:
- Fertilizer A: \(n_1 = 12, \bar{x}_1 = 18.5, s_1 = 3.2\)
- Fertilizer B: \(n_2 = 15, \bar{x}_2 = 22.1, s_2 = 2.8\)
- Fertilizer C: \(n_3 = 10, \bar{x}_3 = 19.8, s_3 = 3.6\)
- Fertilizer D: \(n_4 = 13, \bar{x}_4 = 25.2, s_4 = 3.0\)
1. Check the equal variance assumption.
2. Set up appropriate hypotheses.
3. Would you expect to reject \(H_0\) based on the sample means? Explain your reasoning.
Interpretation Questions:
1. Explain why \(F\)-values are always non-negative but \(t\)-values can be negative.
2. Why do we always use lower.tail = FALSE when calculating \(p\)-values for \(F\)-tests?
3. What does it mean when MSA is much larger than MSE?
4. How would the \(F\)-test statistic change if all group sample means increased by the same amount?

Source	df	Sum Sq	Mean Sq	F value	Pr(>F)
Treatment	3	450	?	?	0.008
Error	?	?	25
Total	47	?