Slides 📊
12.2. Different Sources of Variability in ANOVA
In our previous exploration using side-by-side boxplots, we learned that comparing means in isolation was not sufficient—we had to consider how much variability existed within each group in comparison with the spread of the group means. In this lesson, we formalize this idea mathematically.
Road Map 🧭
Identify the three sources of variability in ANOVA. Quantify the variability from each source using formal notation.
Recognize the roles of sum of squares, degrees of freedom and mean of squares in constructing a sample variance.
Learn the special relationship between the three sums of squares and their degrees of freedom.
Organize the components into an ANOVA table.
12.2.1. The One-Way ANOVA Model
What is a Statistical Model?
As we progress to statistical inference methods of higher complexity, it becomes essential to define a corresponding statistical model to concisely express the core ideas. A statistical model provides a structural decomposition of the data that must hold under the assumptions of the analysis method.
The One-Way ANOVA Model
One-way ANOVA assumes that an observation \(X_{ij}\) takes the following form:
Anove, \(\mu_{i}\) is the unknown true mean of Group \(i\), and \(\varepsilon_{ij}\) is the random error that captures everything not explained by the group mean.
According to the ANOVA assumptions, the random errors are mutually independent and have an equal variance of \(\sigma^2\). Since we have extracted the group means out of the random term, \(\varepsilon_{ij}\) would also have an expected value of zero.
In the ideal case where all populations are normally distributed, therefore, we can write:
Why Is the Model Helpful?
The ANOVA model allows us to view each data point as the outcome of two components, each contributing distinctly. If all population means are truly equal, the only source of randomness among observations would be the \(\varepsilon_{ij}\) terms, whose variance is \(\sigma^2\). If the observed variance in the data is significantly larger than \(\sigma^2\), therefore, we must consider the possibility that differences in group means also contribute.
To formalize this, we first construct the three key measures of variation in ANOVA: variation between groups, variation within groups, and the total variation.
12.2.2. Three Types of Variability
As with any sample variance we have seen before, the three measures of variation will take the following common form:
where the degrees of freedom are chosen to make each statistic an unbiased estimator of its target. The degrees of freedom will show the pattern we have previously seen—it is equal to the difference between the number of data points and the number of estimated means used for construction of the statistic.
Since each sample variance of this form is an “average” of squares, we also call them a mean of squares (MS).
1. SSA and MSA: Between-Group Variation
We first consider the sum of sqaures for between-group variation, or SSA.
Note that SSA is the sum of squared deviations of group means from the overall mean, each weighted proportionally to the group size. The degrees of freedom appropriate for SSA is \(df_A = k-1\) since there are \(k\) group means deviating from a single overall mean.
It follows that the mean of squares for the variation between groups is:
MSA has a special property:
If \(H_0\) is true and the equal variance assumption holds, then the MSA is an unbiased estimator of \(\sigma^2\).
However, if \(H_0\) is false, then the MSA estimates \(\sigma^2\) plus additional variation due to differences in population means.
Therefore, an MSA significantly larger than an estimate of \(\sigma^2\) indicates the existence of additional variance due to distinct group means, whereas an MSA comparable to an estimated \(\sigma^2\) indicates an absence of strong evidence against the null hypothesis.
Why Do We call It SS”A”?
The name SSA stands for “Sum of Squares for Factor A”. It originates from the multi-way ANOVA context involving multiple factors (Factor A, Factor B, etc.). By convention, the sum of squares for the “first” factor is labeled SSA, even when there is only one factor in the anlaysis.
Example 💡: Coffeehouse SSA & MSA ☕️
In the coffeehouse example, SSA and MSA measure how much the store-wise average ages vary from the overall mean. If all the coffeehouses attract similar demographics, SSA and MSA should be small. If they attract different age groups, they should be large.
2. SSE and MSE: Within-Group Variation
The sum of squares of errors, or SSE, is an unscaled measure of how observations within each group deviate from their respective group means due to the random error, \(\varepsilon_{ij}\).
Confirm the second equality by replacing \(S^2_{i\cdot}\) with its explicit formula. The SSE consists of the squared distances of \(n\) observations from one of \(k\) group means, giving us the degrees of freedom \(df_E=n-k\).
Bringing SSE to the correct scale with the degrees of freedom, we obtain:
MSE is an unbiased estimator of the variance within populations, \(\sigma^2\), regardless of whether \(H_0\) is true. As a result, we always estimate the within-population variance with \(S^2 = MSE\). The estimator for population-wise standard deviation is \(S= \sqrt{MSE}\).
Connecting ANOVA and Indepdent Two-Sample Analysis
MSE is a multi-way extension of the pooled variance estimator for independent two-sample analysis. Confirm this by plugging in \(k=2\) to recover \(S^2_p\).
Example 💡: Coffeehouse SSE & MSE ☕️
In the coffeehouse example, the SSE and MSE measure how much individual customer ages vary within each coffeehouse. They represent the natural variation in customer ages that exists regardless of any systematic differences between coffeehouses.
3. Total Sum of Squares (SST)
Finally, we also define a measure for the overall variability in the data.
Note that this would be the numerator for the sample variance if the entire dataset was treated as a single sample. The distances of \(n\) total observations are measured against one overall mean estimator, giving us the degrees of freedom \(df_T = n - 1\).
We do not define a mean of sqaures for the total variation, as it does not hold significance in the ANOVA framework.
Example 💡: Coffeehouse SST ☕️
In the coffeehouse example, SST measures the degree of variation of all customer ages around the single overall sample mean.
12.2.3. The Fundamental ANOVA Identity
The remarkable mathematical result that makes ANOVA possible is that the sums of squares are related by:
Moreover, the degrees of freedom decompose in the same way:
Why This Decomposition Works
We use the trick of adding and subtracting the same terms inside each pair of parentheses in SST:
The cross-product term can be shown to equal zero by taking the following steps:
Since \(2(\bar{X}_{i \cdot} - \bar{X}_{\cdot \cdot})\) does not depend on \(j\), we can factor it out of the inner sum.
The inner sum is then
\[\sum_{j=1}^{n_i} (X_{ij} - \bar{X}_{i \cdot}) = \sum_{j=1}^{n_i}X_{ij} - n_{i}\cdot\frac{\sum_{j=1}^{n_i}X_{ij}}{n_i} = 0.\]
12.2.4. The ANOVA Table
The components of ANOVA are often organized into a table, with rows representing the three difference sources of variability, and the columns representing various characteristics of each source.
Source |
df |
SS |
MS |
F |
\(p\)-value |
|---|---|---|---|---|---|
Factor A |
\(k-1\) |
\(\sum_{i=1}^k n_i(\bar{x}_{i \cdot} - \bar{x}_{\cdot \cdot})^2\) |
\(\frac{\text{SSA}}{k-1}\) |
? |
? |
Error |
\(n-k\) |
\(\sum_{i=1}^k \sum_{j=1}^{n_i}(x_{ij} - \bar{x}_{i \cdot})^2\) |
\(\frac{\text{SSE}}{n-k}\) |
||
Total |
\(n-1\) |
\(\sum_{i=1}^k \sum_{j=1}^{n_i}(x_{ij} - \bar{x}_{\cdot \cdot})^2\) |
The total row is often omitted for conciseness as their entries can be computed by adding up the other dfs and the sums of squares. The entries corresponding to \(F\) and \(p\)-value will be discussed in the upcoming lesson.
Example 💡: Coffeehouse ANOVA Table ☕️
Using the the data summary (Fig. 12.5) and the partial ANOVA table (Fig. 12.6) of the coffeehouse example,
Fill in the blank entries of the ANOVA table.
Provide an estimate of the population standard deviation, \(\sigma\).
Data Summary
Fig. 12.5 Coffeehouse example summary
Partially Complete ANOVA Table
Fig. 12.6 Partial ANOVA table for the cofeehouse example
Let us begin with the degrees of freedom since they can be obtained directly from the data summary. We have \(n=200\) and \(k=5\). Therefore,
(1) \(df_A = k - 1 = 4\)
(2) \(df_E = n - k = 200 - 5 = 195\)
(3) \(df_T = n - 1 = 199\)
We can use \(df_T = df_A + df_E\) as a second check.
(5) \(MSE = 99.8 = \frac{SSE}{195}\), so
\[SSE = 99.8*195 = 19451.\]
(4) Then, using the fact that SSA and SSE add up to SST,
\[SSA = SST - SSE = 28285 - 19451 = 8834\]
(6) Finally,
The MSE as a random variable is an unbiased estimator of \(\sigma^2\). We can use the square root of its observed value as an estimate for \(\sigma\).
12.2.5. Bringing It All Together
Key Takeaways 📝
The ANOVA model decomposes each observation into a group effect plus random error: \(X_{ij} = \mu_i + \varepsilon_{ij}\).
Total Sum of Squares (SST) satisfies \(SST = SSA + SSE\). Their degrees of freedom have a similar association: \(df_T = df_A + df_E\).
Since \(E(MSE) = \sigma^2\) always holds, we use the MSE as the estimator of the common variance \(\sigma^2\), and also denote its observed value as \(s^2\).
MSA is an unbiased astimator of \(\sigma^2\) only when \(H_0\) is true —its true target is greater than \(\sigma^2\) when \(H_0\) is false. Therefore, comparing MSA against MSE gives us a measure of data evidence against the null hypothesis.
Exercises
Model Understanding: Consider the ANOVA model \(X_{ij} = \mu_i + \varepsilon_{ij}\) for a study comparing four different exercise programs with 12 participants per program.
How many \(\mu_i\) parameters are there and what do they represent?
How many \(\varepsilon_{ij}\) terms are there and what assumptions do we make about them?
Sum of Squares Calculation: Given summary data for three groups:
Group 1: \(n_1 = 8\), \(\bar{x}_1 = 15.2\), \(s_1 = 2.3\)
Group 2: \(n_2 = 10\), \(\bar{x}_2 = 12.8\), \(s_2 = 1.9\)
Group 3: \(n_3 = 7\), \(\bar{x}_3 = 18.1\), \(s_3 = 2.7\)
Calculate the overall sample mean \(\bar{x}_{\cdot \cdot}\).
Calculate SSA.
Calculate SSE.
Calcalate SST.