Slides 📊

12.1. Introduction to One-Way ANOVA

Many important research questions involve comparing two or more populations simultaneously. We now need a new approach that can handle the complexity of simultaneous comparisons while controlling the overall error rates.

Road Map 🧭

Identify the experimental conditions that call for the use of ANOVA.
Explain why a comparison of multiple means is called an Analysis of “Variance.”
Define the notations and formulate the hypotheses for ANOVA.
List the assumptions required for validity of ANOVA.

12.1.1. The Fundamental ANOVA Question

Many controlled experiments involve dividing subjects into two or more groups, applying a distinct treatment to each group, and then analyzing their quantitative responses to see if any systematic difference exists among the groups. See the example below for concrete scenarios.

Examples 💡: Experiments That Require Multiple Comparisons

Example 1: Bacterial Growth Study

A research group studies bacteria growth rates in different sugar solutions.

Factor variable: Type of sugar solution
Levels: Glucose, Sucrose, Fructose, Lactose (4 groups)
Quantitative response variable: Bacterial growth rate

Example 2: Gasoline Brand Efficiency

Researchers want to determine if five different gasoline brands affect automobile fuel efficiency.

Factor variable: Gasoline brand
Levels: Brand A, Brand B, Brand C, Brand D, Brand E (5 groups)
Quantitative response variable: Miles per gallon

The Research Question and Our Strategy

In all the examples above, the researchers aim to determine whether any difference exists among the true means of the response groups.

A naive approach to would be to perform two-sample comparisons for all possible pairs of levels. However, there are two critical drawbacks to this approach.

Let \(k\) be the number of levels. The total number of pairwise comparisons is \(k \choose 2\), which grows quickly with \(k\)—for example, \(10\) for \(k=5\) and \(45\) for \(k=10\).
When many inferences are performed simultaneously at significance level \(\alpha\), the probability of making at least one Type I error becomes substantially larger than \(\alpha\).

To avoid unncessary efforts and potential errors, we would like to first perform a single screening hypothesis test which determines whether any difference exists among the population means. Only when we reject the null hypothesis that all means are equal do we proceed to pairwise comparisons, taking special care to control the overall Type I error rate.

The preliminary screening test is called the Analysis of Variance, or ANOVA.

Why Analysis of “Variance”?

At first glance, it seems strange that a procedure designed to compare means is called an analysis of “variance.” This is because the perceived difference among means depend on the relative variances within and between groups. See Fig. 12.1 for a graphical illustration.

Graphical illustration of need for ANOVA — Fig. 12.1 Left: Distributions with large within-group variances; Right: Distributions with small within-group variances

Suppose we obtain samples from Case (a). Most likely, the data points will be spread over a wide range within each group. The within-group variability is wide enough to obscure the distinction created by the difference in central values.

On the other hand, samples from Case (b) will bunch closely arount their respective means, providing a stronger evidence that the true means are indeed distinct.

Note that the true means are identical in both cases—the absolute locations of the true means did not have a significant role in our visual analysis. Instead, the key difference arose from the relative size of the within-group spread in comparison with the spread of the true means.

If the within-group variation is comparable or larger than the between-group variation, we do not have enough evidence to reject the baseline belief that all means are equal.
If the within-group variation is smaller than the spread of the group means, there is a strong evidence that at least one true mean is different than the rest.

Why “One-Way”?

We use the term one-way ANOVA to indicate an experimental design in which groups are formed based on different levels of a single factor.

When analyzing the joint impact of two factors on a response, we use a two-way ANOVA. While the foundational ideas are the same as in one-way ANOVA, an additional feature must be considered—the interaction effect between the two factors, which may amplify or offset their respective main effects. ANOVA models involving more than two factors also exist.

In this course, we focus exclusively on one-way ANOVA.

12.1.2. Formalizing ANOVA

Notation

Suppose we have \(k\) different groups, where \(k \geq 2\).

ANOVA Notations for Group \(i\)
Group Index	\[i \in \{1, 2, \ldots, k\}\]
Observation Index	\[j \in \{1, 2, \ldots, n_i\}\]
Population Mean and Variance	\[\mu_i \text{ and } \sigma^2_i\]
Group Sample Size	\[n_i\]
Group Sample	\[X_{i1}, X_{i2}, \cdots, X_{in_i}\]
Group Sample Mean	\[\bar{X}_{i\cdot} = \frac{1}{n_i}\sum_{j=1}^{n_i} X_{ij}\]
Group Sample Variance	\[S_{i\cdot}^2 = \frac{1}{n_{i}-1}\sum_{j=1}^{n_i}(X_{ij} - \bar{X}_{i\cdot})^2\]
ANOVA Notation for Overall Summary
Overall Sample Size	\[n = n_1 + n_2 + \cdots + n_k\]
Overall Sample Mean	\[\bar{X}_{\cdot \cdot} = \frac{1}{n} \sum_{i=1}^{k} \sum_{j=1}^{n_i} X_{ij} = \frac{1}{n} \sum_{i=1}^{k} n_i \bar{X}_{i \cdot}\]

Each random variable in a sample is now indexed with double subscripts.

The first subscript \(i\) specifies the group to which the data point belongs.
The second subscript \(j\) indicates the observeration index within the group.

In addition, we use \(\cdot\) in the place of an index to indicate that a summary statistic is computed over all values of the corresponding index.

For example, the notation \(\bar{X}_{i\cdot}\) indiates that the statistic is computed using data points for all values of the second index, while keeping the group index fixed at \(i\).
Likewise, \(\bar{X}_{\cdot \cdot}\) means that data points for all indices are used to compute the summary.

Hypothesis Formulation

There is only one type of dual hypothesis for one-way ANOVA. The null hypothesis states that all true means are equal:

\(H_0: \mu_1 = \mu_2 = \mu_3 = \cdots = \mu_k.\)

The alternative hypothesis states that at least one population mean is different from the others. This can be expressed in several equivalent ways:

\(H_a:\) At least one \(\mu_i\) is different from the others.
\(H_a:\) Not all population means are equal.
\(H_a: \mu_i \neq \mu_j \text{ for some } i \neq j\)

Important Note About the Alternative

❌ It is incorrect to write the alternative hypothesis as

\[H_0: \mu_1 \neq \mu_2 \neq \mu_3 \neq \cdots \neq \mu_k.\]

The alternative hypothesis does not state that all means are different from each other. It only requires that at least one mean differs from the others. This could mean:

Only \(\mu_1\) differs from \(\mu_2 = \mu_3 = \mu_4\)
Two groups differ: \(\mu_1 = \mu_2 \neq \mu_3 = \mu_4\)
All groups differ: :math:`mu_1 neq mu_2 neq mu_3 neq mu_4`s

Example 💡: Coffeehouse Demographics Study ☕️

A student reporter wants to study the demographics of coffeehouses around campus. Specifically, she’s interested in whether different coffeehouses attract customers of different ages. The reporter randomly selects 50 customers at each of five coffeehouses using a systematic sampling approach. Due to non-response, the final sample sizes vary slightly across coffeehouses but remain close to 50 per location.

Component Identification

Factor variable: Coffeehouse location (5 levels)
Response variable: Customer age (quantitative, measured in years)
Research question: Are there statistically significant differences between the average ages of customers at the different coffeehouses?

ANOVA Setup

Let \(\mu_i\) represent the true mean age of customers at Coffeehouse \(i\), for each \(i = 1, 2, \cdots , 5\). The hypotheses are:

\[\begin{split}&H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu_5 \\ &H_a: \mu_i \neq \mu_j \text{ for some } i \neq j\end{split}\]

Assumptions

Like all statistical procedures, ANOVA relies on certain assumptions for validity. These assumptions extend the familiar requirements from two-sample procedures to the multi-group setting.

Assumption 1: Independent Simple Random Samples

The observations in each of the \(k\) groups must form a simple random sample. That is, within each group \(i\), the observations \(X_{i1}, X_{i2}, \ldots, X_{in_i}\) must be independent and identically distributed.

Assumption 2: Independence Between Groups

Samples from different populations must be independent of each other.

Assumption 3: Normality of the Sample Means

Each population must either be normally distributed or have a large enough sample size for the CLT to hold, so that the sample mean is approximately normally distributed.

Assumption 4: Equal Variances

All populations must have equal variances:

\[\sigma^2_1 = \sigma^2_2 = \cdots = \sigma^2_k = \sigma^2\]

This assumption allows us to pool information across groups when estimating the common variance, leading to more efficient procedures.

We discuss how to verify Assumption 4 through observed data in the following section. Alternative approaches must be used when the equal variance assumption fails, which is beyond the scope of this course.

12.1.3. Preliminary Visual Analysis

Before conducting formal ANOVA procedures, it is standard practice to gain insights about the populations and the samples through graphical analysis. We pay special attention to two aspects:

Signs of significant differences among population means
Any violation of the assumptions

Signs of Differences Among True Means

Note that the sample means alone do not give us any sense of whether the true means are different—the sample means will almost always be different from each another due to the randomness in the population distributions. Instead, we pay attention to the spread of the samples through their side-by-side boxlpots. Fig. 12.2 shows the side-by-side boxplots for the coffeehouse example.

Fig. 12.2 Side-by-side boxplots of the five samples in the coffeehouse example

We assess the strength of visual evidence by how much the boxplots overlap in their spans. If all the boxplots span similar regions,there is little visual evidence of distinct means. If at least one group partially overlaps with the rest, there is a higher chance of eventually rejecting the null hypothesis in ANOVA.

Note that no formal conclusion should be drawn from visual evidence alone. Boxplots serve only as a tool to gain insight into the dataset.

Identifying Violation of Assumptions

This stage involves examining all available visual resources, including the boxplots, the histograms, and the normal probability plot.

(a) Boxplots

To confirm the equal variance assumption visually, ensure that the range and IQR of the samples are similar on the sid-by-side boxplots. The assumption must also be checked numerically. As a rule of thumb, we say that the sign of violation is not strong if:

\[\frac{\max{s_i}}{\min{s_i}} \leq 2.\]

Boxplots are also used to check if any potential outliers exist.

(b) Histograms and Normal Probability Plot

To check whether the samples show any signs of non-normality, use group-wise histograms.

Fig. 12.3 Group-wise histograms for the coffeehouse example

A combined normal probability plot of \(x_{ij}-\bar{x}_{i\cdot}\) (data points re-centered to have zero-mean; also called the residuals) can be used together.

normal probability plot for the coffeehouse example — Fig. 12.4 Normal probability plot for the coffeehouse example

Example 💡: Coffeehouse Demographics Study ☕️

Use Fig. 12.2, Fig. 12.3, Fig. 12.4, and the numerical summary below to perform a preliminary assessment of the coffeehouse dataset.

1. Do the graphs show signs of distinct means?

From the boxplots, we observe that the customer age sample from Coffeehous 4 spans a notably lower region than others. Its difference with Coffeehouse 2 is most evident—the mean age at Coffeehouse 2 appears higher than even the maximum observed at Coffeehouse 4. Other coffeehouses show larger overlaps.

2. Any violation of assumptions?

Equal variance

From the boxplots, the variability of Sample 2 and Sample 4 appear to be different. We need to make sure that the largest ratio between two sample standard deviations is less than or equal to 2:

\[\frac{\max{s_i}}{\min{s_i}} = \frac{\sqrt{12.97}}{\sqrt{6.99}} = 1.3622 < 2 ✔\]

So we consider the sample variances to be similar enough for the equal population variance assumption.

Normality

The histogram of Coffeehouse 4 and the overall QQ plot show a sign of skewness. Since the dataset is sufficiently large, this moderate departure from normality is okay.

3. Summary

Since the assumptions are reasonably met and the boxplots show signs of difference in means, we predict that ANOVA will yield a statistically significant result. We will perform the formal test in the upcoming sections and find out whether our prediction is correct.

Key Takeaways 📝

ANOVA is a statistical analysis that simultaneously compares two or more population means.
Double subscript notation \(X_{ij}\) systematically handles multiple groups, with dot notation indicating which indices are averaged over.
Four key assumptions must be satisfied for validity of ANOVA: independent simple random samples, independence between groups, normality, and equal variances.
Perform preliminary graphical and numerical assessments to check the assumptions and spot signs of statistical significance.

Exercises

Hypothesis Formulation: A nutrition researcher wants to compare the average weight loss achieved by participants following four different diet plans over 12 weeks.
1. Define appropriate notation for the population means.
2. Write the null and alternative hypotheses for the ANOVA.
3. Explain what it would mean to reject the null hypothesis.
4. Explain what it would mean to fail to reject the null hypothesis.
Assumption Analysis: For each scenario, discuss which ANOVA assumptions might be problematic and why:
1. Comparing reaction times across different age groups (20s, 40s, 60s, 80s)
2. Analyzing test scores for students taught by different teachers within the same school
3. Comparing income levels across different geographic regions
4. Studying bacterial growth rates under different temperature conditions
Visual Interpretation: Imagine you created side-by-side boxplots for a study comparing average customer wait times at four different restaurant types. The boxplots show:
- Fast food: low median, small spread
- Casual dining: medium median, medium spread
- Fine dining: high median, large spread
- Food trucks: medium median, small spread
1. What would you predict about the ANOVA results?
2. Which assumptions might you be concerned about?
3. What additional information would help you make a more informed prediction?