12.2. Different Sources of Variability in an ANOVA Model

When we want to test for differences between means in several populations, we can no longer use our familiar test statistic format of “point estimator minus null value, divided by standard error.” The challenge is that we’re dealing with multiple populations simultaneously, so we need a completely different approach to construct a test statistic that can compare all groups at once.

Road Map 🧭

  • Problem we will solve – How to construct a test statistic for comparing multiple population means by breaking down total variability into meaningful pieces

  • Tools we’ll learn – The ANOVA model framework and sum of squares decomposition that forms the mathematical foundation for F-tests

  • How it fits – This provides the theoretical groundwork for ANOVA hypothesis testing and introduces variance analysis concepts used throughout advanced statistics

12.2.1. From Visual Intuition to Mathematical Framework

In our previous exploration using side-by-side boxplots, we intuitively looked for group differences by considering both the separation between group centers and the spread within each group. We weren’t just comparing means in isolation—we were also taking into account how much variability existed within each group. This visual approach gives us the conceptual foundation for the mathematical framework we’ll now develop.

The key insight is that we need to decompose the total variability in our data into different sources. If groups truly come from populations with different means, then the variability between group sample means should be large relative to the natural variability within groups. If all groups come from the same population, then the between-group variability should be similar to the within-group variability.

12.2.2. The One-Way ANOVA Model

We can think of one-way ANOVA as assuming our data comes from a specific statistical model. This model-based thinking will reappear when we study regression analysis, but for now let’s understand what it means in the ANOVA context.

Our data consists of observations \(X_{ij}\) where \(i\) represents the group index (\(i = 1, 2, \ldots, k\)) and \(j\) represents the observation number within group \(i\) (\(j = 1, 2, \ldots, n_i\)).

The ANOVA Model Equation

Each observation can be broken down into two components:

\[X_{ij} = \mu_i + \varepsilon_{ij}\]

Where \(\mu_i\) is the true population mean for group \(i\), and \(\varepsilon_{ij}\) is the error term that captures everything not explained by the group mean.

The population means \(\mu_i\) represent the average tendencies for each group—these capture the systematic differences between populations that we’re trying to detect. The error terms \(\varepsilon_{ij}\) represent the random variation within each population that cannot be explained by the group mean alone.

We assume the error terms follow a normal distribution:

\[\varepsilon_{ij} \sim N(0, \sigma^2) \text{ independently}\]

This assumption states that errors are normally distributed with mean zero and the same variance \(\sigma^2\) across all groups—this is our equal variance assumption made explicit in the model.

Understanding What We’re Testing

This model framework helps clarify what we’re actually testing. Under the null hypothesis \(H_0: \mu_1 = \mu_2 = \cdots = \mu_k\), all observations come from populations with the same mean, so the only differences between observations are due to random error. Under the alternative hypothesis, systematic differences between population means create additional sources of variation beyond just random error.

12.2.3. Three Types of Sum of Squares

The genius of ANOVA lies in decomposing the total variability in our data into meaningful components. We’ll develop three different “sum of squares” that measure different sources of variation.

Before diving in, let’s establish the pattern we’ll see repeatedly. Any sample variance can be written in the form:

\[S^2 = \frac{\text{Sum of Squares}}{\text{Degrees of Freedom}}\]

We’ll have different types of sum of squares for different sources of variation, and we’ll convert each to a “mean square” by dividing by the appropriate degrees of freedom.

Sum of Squares Between Groups (SSA)

The first source of variability we consider is how much the group sample means deviate from the overall sample mean. This measures the between-group variability.

In our coffeehouse example, imagine we have sample means for each of the five coffeehouses. If these sample means are spread far apart from the overall mean, this suggests the groups come from populations with different means.

\[\text{SSA} = \sum_{i=1}^k n_i (\bar{X}_{i \cdot} - \bar{X}_{\cdot \cdot})^2\]

Here \(\bar{X}_{i \cdot}\) is the sample mean for group \(i\), \(\bar{X}_{\cdot \cdot}\) is the overall sample mean, and \(n_i\) is the sample size for group \(i\).

Large SSA indicates that group means are spread far apart, suggesting different population means. Small SSA indicates that group means are close to the overall mean, suggesting similar population means.

Since we’re comparing \(k\) group means and estimating one overall mean, we have \(k - 1\) degrees of freedom for SSA.

Sum of Squares Within Groups (SSE)

The second source of variability measures how observations within each group deviate from their respective group means. This captures the within-group or error variability.

\[\text{SSE} = \sum_{i=1}^k \sum_{j=1}^{n_i} (X_{ij} - \bar{X}_{i \cdot})^2\]

This can also be written using sample variances:

\[\text{SSE} = \sum_{i=1}^k (n_i - 1) S^2_i\]

This shows that SSE is essentially a weighted sum of the sample variances from each group—it’s like the pooled variance concept from two-sample procedures, but extended to multiple groups.

SSE measures the natural variability within populations and provides an estimate of the common variance \(\sigma^2\) under our equal variance assumption. Importantly, SSE doesn’t depend on whether the null hypothesis is true or false—it only depends on the within-group variability.

We have \(n\) total observations and estimate \(k\) group means, giving us \(n - k\) degrees of freedom for SSE.

Sum of Squares Total (SST)

The third component represents the total variability in the data, as if we ignored group structure entirely.

\[\text{SST} = \sum_{i=1}^k \sum_{j=1}^{n_i} (X_{ij} - \bar{X}_{\cdot \cdot})^2\]

SST measures how much observations deviate from the overall mean—this is the baseline variability if we treated all data as coming from one population. We have \(n\) total observations and estimate one overall mean, giving us \(n - 1\) degrees of freedom.

12.2.4. The Fundamental ANOVA Identity

The remarkable mathematical result that makes ANOVA possible is that these three sum of squares components are related by:

\[\text{SST} = \text{SSA} + \text{SSE}\]

In words: Total variability equals between-group variability plus within-group variability.

The degrees of freedom also decompose in the same way:

\[(n - 1) = (k - 1) + (n - k)\]

Why This Decomposition Works

We can demonstrate this decomposition by a clever algebraic trick. Starting with SST, we add and subtract the group means (which doesn’t change anything since we’re adding zero):

\[\text{SST} = \sum_{i=1}^k \sum_{j=1}^{n_i} (X_{ij} - \bar{X}_{\cdot \cdot})^2\]
\[= \sum_{i=1}^k \sum_{j=1}^{n_i} [(X_{ij} - \bar{X}_{i \cdot}) + (\bar{X}_{i \cdot} - \bar{X}_{\cdot \cdot})]^2\]

When we expand this squared expression, we get three terms: the SSE term, the SSA term, and a cross-product term. The remarkable result is that the cross-product term equals zero, leaving us with exactly SST = SSA + SSE.

The cross-product term equals zero because \((\bar{X}_{i \cdot} - \bar{X}_{\cdot \cdot})\) doesn’t depend on \(j\), so we can factor it out, leaving \(\sum_{j=1}^{n_i} (X_{ij} - \bar{X}_{i \cdot})\), which always equals zero since deviations from a sample mean sum to zero.

12.2.5. From Sum of Squares to Mean Squares

Sum of squares by themselves aren’t directly useful for hypothesis testing because they depend on sample sizes. We need to convert them to mean squares by dividing by appropriate degrees of freedom to create unbiased estimators.

Mean Square Between Groups (MSA)

\[\text{MSA} = \frac{\text{SSA}}{k - 1}\]

MSA has a special property: if \(H_0\) is true (all \(\mu_i\) are equal) and the equal variance assumption holds, then MSA is an unbiased estimator of \(\sigma^2\). However, if \(H_0\) is false (some \(\mu_i\) differ), then MSA estimates \(\sigma^2\) plus additional variation due to differences in population means.

This means MSA gets larger when group means are more spread apart, making it sensitive to violations of the null hypothesis.

Mean Square Error (MSE)

\[\text{MSE} = \frac{\text{SSE}}{n - k}\]

MSE is an unbiased estimator of \(\sigma^2\) regardless of whether \(H_0\) is true or false. It only requires the equal variance assumption to hold. MSE is essentially the ANOVA version of pooled variance from two-sample procedures—it represents our best estimate of the natural variability within populations.

12.2.6. Building the Test Statistic Logic

The key insight for constructing an ANOVA test statistic comes from comparing these mean squares. We can think of the ratio:

\[\frac{\text{MSA}}{\text{MSE}} = \frac{\text{Between-group variability}}{\text{Within-group variability}}\]

When \(H_0\) is true, both MSA and MSE estimate \(\sigma^2\), so their ratio should be close to 1. When \(H_0\) is false, MSE still estimates \(\sigma^2\), but MSA estimates something larger, so their ratio should be substantially greater than 1.

This suggests that large values of MSA/MSE provide evidence against \(H_0\). The larger this ratio, the stronger the evidence for group differences.

12.2.7. Understanding Through the Coffeehouse Example

In our coffeehouse study, the variance decomposition tells this story:

SST asks: How much do all customer ages vary from the overall average age across all coffeehouses?

SSA asks: How much do the average ages at different coffeehouses vary from the overall average? If coffeehouses attract similar demographics, this should be small. If they attract different age groups, this should be large.

SSE asks: How much do individual customer ages vary within each coffeehouse? This represents the natural variation in customer ages that exists regardless of systematic differences between coffeehouses.

If SSA is large relative to SSE, it suggests coffeehouses do attract systematically different age demographics.

12.2.8. What Makes the Ratios Large or Small?

MSA becomes large when:

  • Group means are spread far apart

  • Sample sizes are large (making sample means more precise)

  • Within-group variability is small (making differences more detectable)

MSE becomes large when:

  • There’s high natural variability within populations

  • There’s measurement error

  • Other sources of variation aren’t controlled in the study design

The ratio MSA/MSE becomes large when we have well-separated group means relative to the natural variability within groups—exactly what we were looking for visually in our boxplots.

12.2.9. Bringing It All Together

We now have the building blocks for ANOVA hypothesis testing. In the next section, we’ll see how the ratio MSA/MSE follows an F-distribution under the null hypothesis, allowing us to compute p-values and make formal decisions about group differences. We’ll also see how this approach controls our overall Type I error rate while comparing multiple groups simultaneously.

Key Takeaways 📝

  1. The ANOVA model \(X_{ij} = \mu_i + \varepsilon_{ij}\) decomposes each observation into a group effect plus random error, providing the theoretical foundation for variance analysis.

  2. Total variability decomposes exactly into between-group and within-group components: SST = SSA + SSE, with degrees of freedom that also add up: (n-1) = (k-1) + (n-k).

  3. Mean squares create unbiased estimators: MSE always estimates \(\sigma^2\), while MSA estimates \(\sigma^2\) when \(H_0\) is true but estimates more when \(H_0\) is false.

  4. The ratio MSA/MSE compares between-group to within-group variability, forming the foundation for ANOVA test statistics.

  5. This framework formalizes our visual intuition from boxplots—groups that are well-separated relative to their internal spread suggest different population means.

Exercises

  1. Model Understanding: Consider the ANOVA model \(X_{ij} = \mu_i + \varepsilon_{ij}\) for a study comparing four different exercise programs with 12 participants per program.

    1. How many \(\mu_i\) parameters are there and what do they represent?

    2. How many \(\varepsilon_{ij}\) terms are there and what assumptions do we make about them?

    3. If \(\mu_1 = 8.2\), \(\mu_2 = 7.5\), \(\mu_3 = 8.8\), and \(\mu_4 = 7.9\), what would it mean for \(H_0\) to be true?

  2. Sum of Squares Calculation: Given summary data for three groups:

    • Group 1: \(n_1 = 8\), \(\bar{x}_1 = 15.2\), \(s_1 = 2.3\)

    • Group 2: \(n_2 = 10\), \(\bar{x}_2 = 12.8\), \(s_2 = 1.9\)

    • Group 3: \(n_3 = 7\), \(\bar{x}_3 = 18.1\), \(s_3 = 2.7\)

    1. Calculate the overall sample mean \(\bar{x}_{\cdot \cdot}\)

    2. Calculate SSA

    3. Calculate SSE

    4. Verify that the degrees of freedom add up correctly

  3. Conceptual Questions:

    1. Explain why SSA measures “between-group variability”

    2. Explain why SSE measures “within-group variability”

    3. Why do we divide sum of squares by degrees of freedom to get mean squares?

  4. Interpretation: A researcher studying plant growth obtains MSA = 34.7 and MSE = 8.2.

    1. Calculate the ratio MSA/MSE and explain what this suggests

    2. Would you expect this ratio to be larger or smaller if the treatments had no effect?