12.1. Introduction to One-Way ANOVA

Many important research questions involve comparing two or more populations simultaneously. We now need a new approach that can handle the complexity of simultaneous comparisons while controlling the overall error rates.

Road Map 🧭

  • Identify the experimental conditions that call for the use of ANOVA.

  • Explain why a comparison of multiple means is called an Analysis of “Variance.”

  • Define the notations and formulate the hypotheses for ANOVA.

  • List the assumptions required for validity of ANOVA.

12.1.1. The Fundamental ANOVA Question

Many controlled experiments involve dividing subjects into two or more groups, applying a distinct treatment to each group, and then analyzing their quantitative responses to see if any systematic difference exists among the groups. See the example below for concrete scenarios.

Examples 💡: Experiments That Require Multiple Comparisons

Example 1: Bacterial Growth Study

A research group studies bacteria growth rates in different sugar solutions.

  • Factor variable: Type of sugar solution

  • Levels: Glucose, Sucrose, Fructose, Lactose (4 groups)

  • Quantitative response variable: Bacterial growth rate

Example 2: Gasoline Brand Efficiency

Researchers want to determine if five different gasoline brands affect automobile fuel efficiency.

  • Factor variable: Gasoline brand

  • Levels: Brand A, Brand B, Brand C, Brand D, Brand E (5 groups)

  • Quantitative response variable: Miles per gallon

The Research Question and Our Strategy

In all the examples above, the researchers aim to determine whether any difference exists among the true means of the response groups.

A naive approach to would be to perform two-sample comparisons for all possible pairs of levels. However, there are two critical drawbacks to this approach.

  1. Let \(k\) be the number of levels. The total number of pairwise comparisons is \(k \choose 2\), which grows quickly with \(k\)—for example, \(10\) for \(k=5\) and \(45\) for \(k=10\).

  2. When many inferences are performed simultaneously at significance level \(\alpha\), the probability of making at least one Type I error becomes substantially larger than \(\alpha\).

To avoid unncessary efforts and potential errors, we would like to first perform a single screening hypothesis test which determines whether any difference exists among the population means. Only when we reject the null hypothesis that all means are equal do we proceed to pairwise comparisons, taking special care to control the overall Type I error rate.

The preliminary screening test is called the Analysis of Variance, or ANOVA.

Why Analysis of “Variance”?

At first glance, it seems strange that a procedure designed to compare means is called an analysis of “variance.” This is because the perceived difference among means depend on the relative variances within and between groups. See Fig. 12.1 for a graphical illustration.

Graphical illustration of need for ANOVA

Fig. 12.1 Left: Distributions with large within-group variances; Right: Distributions with small within-group variances

Suppose we obtain samples from Case (a). Most likely, the data points will be spread over a wide range within each group. The within-group variability is wide enough to obscure the distinction created by the difference in central values.

On the other hand, samples from Case (b) will bunch closely arount their respective means, providing a stronger evidence that the true means are indeed distinct.

Note that the true means are identical in both cases—the absolute locations of the true means did not have a significant role in our visual analysis. Instead, the key difference arose from the relative size of the within-group spread in comparison with the spread of the true means.

  1. If the within-group variation is comparable or larger than the between-group variation, we do not have enough evidence to reject the baseline belief that all means are equal.

  2. If the within-group variation is smaller than the spread of the group means, there is a strong evidence that at least one true mean is different than the rest.

Why “One-Way”?

We use the term one-way ANOVA to indicate an experimental design in which groups are formed based on different levels of a single factor.

When analyzing the joint impact of two factors on a response, we use a two-way ANOVA. While the foundational ideas are the same as in one-way ANOVA, an additional feature must be considered—the interaction effect between the two factors, which may amplify or offset their respective main effects. ANOVA models involving more than two factors also exist.

In this course, we focus exclusively on one-way ANOVA.

12.1.2. Formalizing ANOVA

Notation

Suppose we have \(k\) different groups, where \(k \geq 2\).

ANOVA Notations for Group \(i\)

Group Index

\[i \in \{1, 2, \ldots, k\}\]

Observation Index

\[j \in \{1, 2, \ldots, n_i\}\]

Population Mean and Variance

\[\mu_i \text{ and } \sigma^2_i\]

Group Sample Size

\[n_i\]

Group Sample

\[X_{i1}, X_{i2}, \cdots, X_{in_i}\]

Group Sample Mean

\[\bar{X}_{i\cdot} = \frac{1}{n_i}\sum_{j=1}^{n_i} X_{ij}\]

Group Sample Variance

\[S_{i\cdot}^2 = \frac{1}{n_{i}-1}\sum_{j=1}^{n_i}(X_{ij} - \bar{X}_{i\cdot})^2\]

ANOVA Notation for Overall Summary

Overall Sample Size

\[n = n_1 + n_2 + \cdots + n_k\]

Overall Sample Mean

\[\bar{X}_{\cdot \cdot} = \frac{1}{n} \sum_{i=1}^{k} \sum_{j=1}^{n_i} X_{ij} = \frac{1}{n} \sum_{i=1}^{k} n_i \bar{X}_{i \cdot}\]

Each random variable in a sample is now indexed with double subscripts.

  • The first subscript \(i\) specifies the group to which the data point belongs.

  • The second subscript \(j\) indicates the observeration index within the group.

In addition, we use \(\cdot\) in the place of an index to indicate that a summary statistic is computed over all values of the corresponding index.

  • For example, the notation \(\bar{X}_{i\cdot}\) indiates that the statistic is computed using data points for all values of the second index, while keeping the group index fixed at \(i\).

  • Likewise, \(\bar{X}_{\cdot \cdot}\) means that data points for all indices are used to compute the summary.

Hypothesis Formulation

There is only one type of dual hypothesis for one-way ANOVA. The null hypothesis states that all true means are equal:

  • \(H_0: \mu_1 = \mu_2 = \mu_3 = \cdots = \mu_k.\)

The alternative hypothesis states that at least one population mean is different from the others. This can be expressed in several equivalent ways:

  • \(H_a:\) At least one \(\mu_i\) is different from the others.

  • \(H_a:\) Not all population means are equal.

  • \(H_a: \mu_i \neq \mu_j \text{ for some } i \neq j\)

Important Note About the Alternative

❌ It is incorrect to write the alternative hypothesis as

\[H_0: \mu_1 \neq \mu_2 \neq \mu_3 \neq \cdots \neq \mu_k.\]

The alternative hypothesis does not state that all means are different from each other. It only requires that at least one mean differs from the others. This could mean:

  • Only \(\mu_1\) differs from \(\mu_2 = \mu_3 = \mu_4\)

  • Two groups differ: \(\mu_1 = \mu_2 \neq \mu_3 = \mu_4\)

  • All groups differ: \(\mu_1 \neq \mu_2 \neq \mu_3 \neq \mu_4\)

Example 💡: Coffeehouse Demographics Study ☕️

A student reporter wants to study the demographics of coffeehouses around campus. Specifically, she’s interested in whether different coffeehouses attract customers of different ages. The reporter randomly selects 50 customers at each of five coffeehouses using a systematic sampling approach. Due to non-response, the final sample sizes vary slightly across coffeehouses but remain close to 50 per location.

Component Identification

  • Factor variable: Coffeehouse location (5 levels)

  • Response variable: Customer age (quantitative, measured in years)

  • Research question: Are there statistically significant differences between the average ages of customers at the different coffeehouses?

ANOVA Setup

Let \(\mu_i\) represent the true mean age of customers at Coffeehouse \(i\), for each \(i = 1, 2, \cdots , 5\). The hypotheses are:

\[\begin{split}&H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu_5 \\ &H_a: \mu_i \neq \mu_j \text{ for some } i \neq j\end{split}\]

Assumptions

Like all statistical procedures, ANOVA relies on certain assumptions for validity. These assumptions extend the familiar requirements from two-sample procedures to the multi-group setting.

Assumption 1: Independent Simple Random Samples

The observations in each of the \(k\) groups must form a simple random sample. That is, within each group \(i\), the observations \(X_{i1}, X_{i2}, \ldots, X_{in_i}\) must be independent and identically distributed.

Assumption 2: Independence Between Groups

Samples from different populations must be independent of each other.

Assumption 3: Normality of the Sample Means

Each population must either be normally distributed or have a large enough sample size for the CLT to hold, so that the sample mean is approximately normally distributed.

Assumption 4: Equal Variances

All populations must have equal variances:

\[\sigma^2_1 = \sigma^2_2 = \cdots = \sigma^2_k = \sigma^2\]

This assumption allows us to pool information across groups when estimating the common variance, leading to more efficient procedures.

We discuss how to verify Assumption 4 through observed data in the following section. Alternative approaches must be used when the equal variance assumption fails, which is beyond the scope of this course.

12.1.3. Preliminary Visual Analysis

Before conducting formal ANOVA procedures, it is standard practice to gain insights about the populations and the samples through graphical analysis. We pay special attention to two aspects:

  1. Signs of significant differences among population means

  2. Any violation of the assumptions

Signs of Differences Among True Means

Note that the sample means alone do not give us any sense of whether the true means are different—the sample means will almost always be different from each another due to the randomness in the population distributions. Instead, we pay attention to the spread of the samples through their side-by-side boxlpots. Fig. 12.2 shows the side-by-side boxplots for the coffeehouse example.

Side-by-side boxplots of the five samples in the coffeehouse example

Fig. 12.2 Side-by-side boxplots of the five samples in the coffeehouse example

We assess the strength of visual evidence by how much the boxplots overlap in their spans. If all the boxplots span similar regions,there is little visual evidence of distinct means. If at least one group partially overlaps with the rest, there is a higher chance of eventually rejecting the null hypothesis in ANOVA.

Note that no formal conclusion should be drawn from visual evidence alone. Boxplots serve only as a tool to gain insight into the dataset.

Identifying Violation of Assumptions

This stage involves examining all available visual resources, including the boxplots, the histograms, and the normal probability plot.

(a) Boxplots

To confirm the equal variance assumption visually, ensure that the range and IQR of the samples are similar on the sid-by-side boxplots. The assumption must also be checked numerically. As a rule of thumb, we say that the sign of violation is not strong if:

\[\frac{\max{s_i}}{\min{s_i}} \leq 2.\]

Boxplots are also used to check if any potential outliers exist.

(b) Histograms and Normal Probability Plot

To check whether the samples show any signs of non-normality, use group-wise histograms.

Group-wise histograms for the coffeehouse example

Fig. 12.3 Group-wise histograms for the coffeehouse example

A combined normal probability plot of \(x_{ij}-\bar{x}_{i\cdot}\) (data points re-centered to have zero-mean; also called the residuals) can be used together.

normal probability plot for the coffeehouse example

Fig. 12.4 Normal probability plot for the coffeehouse example

Example 💡: Coffeehouse Demographics Study ☕️

Use Fig. 12.2, Fig. 12.3, Fig. 12.4, and the numerical summary below to perform a preliminary assessment of the coffeehouse dataset.

📊 Download the coffeehouse dataset (CSV)

Table 12.1 Sample Statistics

Sample (Levels of Factor Variable)

Sample Size

Mean

Variance

Population 1

\(n_1 = 39\)

\(\bar{x}_{1.} = 39.13\)

\(s_1^2 = 62.43\)

Population 2

\(n_2 = 38\)

\(\bar{x}_{2.} = 46.66\)

\(s_2^2 = 168.34\)

Population 3

\(n_3 = 42\)

\(\bar{x}_{3.} = 40.50\)

\(s_3^2 = 119.62\)

Population 4

\(n_4 = 38\)

\(\bar{x}_{4.} = 26.42\)

\(s_4^2 = 48.90\)

Population 5

\(n_5 = 43\)

\(\bar{x}_{5.} = 34.07\)

\(s_5^2 = 98.50\)

Combined

\(n = 200\)

\(\bar{x}_{..} = 37.35\)

\(s^2 = 142.14\)

1. Do the graphs show signs of distinct means?

From the boxplots, we observe that the customer age sample from Coffeehouse 4 spans a notably lower region than others. Its difference with Coffeehouse 2 is most evident—the mean age at Coffeehouse 2 appears higher than even the maximum observed at Coffeehouse 4. Other coffeehouses show larger overlaps.

2. Any violation of assumptions?

Equal variance

From the boxplots, the variability of Sample 2 and Sample 4 appear to be different. We need to make sure that the largest ratio between two sample standard deviations is less than or equal to 2:

\[\frac{\max{s_i}}{\min{s_i}} = \frac{\sqrt{168.34}}{\sqrt{48.90}} = 1.855 < 2 \checkmark\]

So we consider the sample variances to be similar enough for the equal population variance assumption.

Normality

The histogram of Coffeehouse 4 and the overall QQ plot show a sign of skewness. Since the dataset is sufficiently large, this moderate departure from normality is okay.

3. Summary

Since the assumptions are reasonably met and the boxplots show signs of difference in means, we predict that ANOVA will yield a statistically significant result. We will perform the formal test in the upcoming sections and find out whether our prediction is correct.

Key Takeaways 📝

  1. ANOVA is a statistical analysis that simultaneously compares two or more population means.

  2. Double subscript notation \(X_{ij}\) systematically handles multiple groups, with dot notation indicating which indices are averaged over.

  3. Four key assumptions must be satisfied for validity of ANOVA: independent simple random samples, independence between groups, normality, and equal variances.

  4. Perform preliminary graphical and numerical assessments to check the assumptions and spot signs of statistical significance.

12.1.4. Exercises

Notation and Rounding Policy

Notation: In ANOVA, we use \(x_{i,j}\) where \(i\) indexes the group (1 to k) and \(j\) indexes the observation within that group. Always use a comma between subscripts to avoid ambiguity.

Rounding: Carry at least 3–4 decimal places in intermediate calculations; round final answers to 2 decimal places (or 3 significant figures for very small values like p-values).


Exercise 1: Identifying ANOVA Scenarios

For each scenario, determine whether one-way ANOVA is the appropriate analysis method. If not, suggest an alternative.

  1. A software company compares bug detection rates of 4 different code review tools, testing each tool on 15 independent projects.

  2. A researcher measures blood pressure before and after a meditation intervention for 30 participants.

  3. An engineer compares the tensile strength of steel samples from 5 different suppliers, with 20 samples from each supplier.

  4. A data scientist wants to determine if there’s a relationship between hours of sleep and exam scores.

  5. A quality control team compares the precision of 3 measurement instruments by having each instrument measure the same 25 reference objects.

  6. A marketing team tests whether click-through rates differ among 6 different ad designs, randomly showing each design to 500 users.

Solution

Part (a): One-way ANOVA is appropriate

  • Factor: Code review tool (4 levels)

  • Response: Bug detection rate (quantitative)

  • Independent samples from each tool

  • Compares k = 4 population means

Part (b): One-way ANOVA is NOT appropriate

This is a paired design (before/after on same individuals). Use a paired t-test instead. The measurements are not independent—each participant provides both measurements.

Part (c): One-way ANOVA is appropriate

  • Factor: Supplier (5 levels)

  • Response: Tensile strength (quantitative)

  • Independent samples from each supplier

  • Compares k = 5 population means

Part (d): One-way ANOVA is NOT appropriate

This describes a relationship between two quantitative variables. Use simple linear regression or correlation analysis instead. ANOVA requires a categorical factor variable.

Part (e): One-way ANOVA is NOT appropriate

The same objects are measured by all instruments, creating a repeated measures or blocked design. The measurements are not independent across instruments. This would require repeated measures ANOVA or treating objects as blocks (beyond STAT 350).

Part (f): One-way ANOVA is appropriate

  • Factor: Ad design (6 levels)

  • Response: Click-through rate (quantitative)

  • Independent random samples for each design

  • Compares k = 6 population means


Exercise 2: Notation Practice

A materials engineer tests the hardness of ceramic samples produced using three different sintering temperatures. The following data are collected:

Temperature

Sample Size

Mean Hardness

Std Dev

1200°C (Group 1)

8

72.5

4.2

1350°C (Group 2)

10

78.3

3.8

1500°C (Group 3)

7

85.1

5.1

Notation convention: In ANOVA, we use double subscripts \(x_{i,j}\) where \(i\) indexes the group (1 to k) and \(j\) indexes the observation within that group (1 to \(n_i\)).

  1. Using proper ANOVA notation, identify: \(k\), \(n_1, n_2, n_3\), \(n\), \(\bar{x}_{1.}, \bar{x}_{2.}, \bar{x}_{3.}\), \(s_1, s_2, s_3\).

  2. Compute the overall (grand) mean \(\bar{x}_{..}\) using the formula:

    \[\bar{x}_{..} = \frac{1}{n}\sum_{i=1}^{k} n_i \bar{x}_{i.}\]
  3. What does the notation \(x_{2,5}\) represent in this context?

  4. Write an expression for “the sum of all observations in Group 1” in two equivalent forms.

Solution

Part (a): Notation identification

  • \(k = 3\) (number of groups/temperature levels)

  • \(n_1 = 8, n_2 = 10, n_3 = 7\) (sample sizes per group)

  • \(n = n_1 + n_2 + n_3 = 8 + 10 + 7 = 25\) (total sample size)

  • \(\bar{x}_{1.} = 72.5, \bar{x}_{2.} = 78.3, \bar{x}_{3.} = 85.1\) (group means)

  • \(s_1 = 4.2, s_2 = 3.8, s_3 = 5.1\) (group standard deviations)

Part (b): Overall mean

\[\bar{x}_{..} = \frac{1}{25}(8 \times 72.5 + 10 \times 78.3 + 7 \times 85.1)\]
\[= \frac{580 + 783 + 595.7}{25} = \frac{1958.7}{25} = 78.35\]

Part (c): Interpretation of \(x_{2,5}\)

The notation \(x_{2,5}\) represents the 5th observation in Group 2 (the 1350°C temperature group).

  • The first subscript (2) identifies the group (\(i = 2\))

  • The second subscript (5) identifies the observation within that group (\(j = 5\))

Important: Always use a comma between subscripts (e.g., \(x_{2,5}\) not \(x_{25}\)) to avoid ambiguity when group or observation indices exceed 9.

Part (d): Sum of observations in Group 1

Two equivalent forms:

\[\sum_{j=1}^{n_1} x_{1,j} = \sum_{j=1}^{8} x_{1,j} \quad \text{(summation notation)}\]
\[= n_1 \bar{x}_{1.} = 8 \times 72.5 = 580 \quad \text{(using definition of sample mean)}\]

The second form, \(n_1 \bar{x}_{1.}\), is a useful shortcut that follows directly from the definition of the sample mean: \(\bar{x}_{1.} = \frac{1}{n_1}\sum_{j=1}^{n_1} x_{1,j}\), which rearranges to \(\sum_{j=1}^{n_1} x_{1,j} = n_1 \bar{x}_{1.}\).


Exercise 3: Hypothesis Formulation

For each research scenario, write the null and alternative hypotheses using proper notation. Define the parameters clearly.

  1. A network engineer tests whether mean packet latency differs among 4 different router configurations.

  2. A pharmaceutical company compares the mean efficacy scores of 5 different drug dosages.

  3. An agricultural researcher investigates whether mean crop yield differs among 3 fertilizer types.

Solution

Part (a): Router configurations

Parameter definitions:

  • \(\mu_1\) = true mean packet latency (ms) for Configuration 1

  • \(\mu_2\) = true mean packet latency (ms) for Configuration 2

  • \(\mu_3\) = true mean packet latency (ms) for Configuration 3

  • \(\mu_4\) = true mean packet latency (ms) for Configuration 4

Hypotheses:

\[\begin{split}&H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4\\ &H_a: \text{At least one } \mu_i \text{ is different from the others}\end{split}\]

Part (b): Drug dosages

Parameter definitions:

  • \(\mu_i\) = true mean efficacy score for dosage level \(i\), for \(i = 1, 2, 3, 4, 5\)

Hypotheses:

\[\begin{split}&H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu_5\\ &H_a: \text{At least one } \mu_i \text{ is different from the others}\end{split}\]

Part (c): Fertilizer types

Parameter definitions:

  • \(\mu_A\) = true mean crop yield (bushels/acre) with Fertilizer A

  • \(\mu_B\) = true mean crop yield (bushels/acre) with Fertilizer B

  • \(\mu_C\) = true mean crop yield (bushels/acre) with Fertilizer C

Hypotheses:

\[\begin{split}&H_0: \mu_A = \mu_B = \mu_C\\ &H_a: \text{At least one } \mu_i \text{ is different from the others}\end{split}\]

Note: The ANOVA alternative hypothesis is a composite alternative: it states that at least one mean differs from the others, but does not specify which means differ or in what direction. This is different from a simple “two-sided” test—we are testing one null against many possible alternatives.


Exercise 4: Checking the Equal Variance Assumption

The equal variance assumption is critical for valid ANOVA results. Use the rule of thumb: the assumption is reasonable if the ratio of the largest to smallest sample standard deviation is at most 2.

Equal vs unequal variance scenarios

Fig. 12.5 Figure: Side-by-side boxplots illustrating equal variances (left), borderline case (center), and unequal variances (right).

For each scenario, determine whether the equal variance assumption is satisfied.

  1. Four treatment groups with sample standard deviations: \(s_1 = 12.4\), \(s_2 = 15.8\), \(s_3 = 11.2\), \(s_4 = 18.6\)

  2. Three groups with sample variances: \(s_1^2 = 25\), \(s_2^2 = 81\), \(s_3^2 = 36\)

  3. Five groups with the following summary:

    Group

    n

    Mean

    Variance

    1

    15

    45.2

    28.5

    2

    18

    52.1

    31.2

    3

    12

    48.7

    42.8

    4

    20

    39.8

    25.1

    5

    16

    55.3

    38.4

Solution

Part (a): Four treatment groups

Standard deviations: 12.4, 15.8, 11.2, 18.6

\[\frac{\max(s_i)}{\min(s_i)} = \frac{18.6}{11.2} = 1.66\]

Since 1.66 ≤ 2, the equal variance assumption is satisfied

Part (b): Three groups (given variances)

First convert variances to standard deviations:

  • \(s_1 = \sqrt{25} = 5\)

  • \(s_2 = \sqrt{81} = 9\)

  • \(s_3 = \sqrt{36} = 6\)

\[\frac{\max(s_i)}{\min(s_i)} = \frac{9}{5} = 1.8\]

Since 1.8 ≤ 2, the equal variance assumption is satisfied

Part (c): Five groups

Convert variances to standard deviations:

  • \(s_1 = \sqrt{28.5} = 5.34\)

  • \(s_2 = \sqrt{31.2} = 5.59\)

  • \(s_3 = \sqrt{42.8} = 6.54\)

  • \(s_4 = \sqrt{25.1} = 5.01\)

  • \(s_5 = \sqrt{38.4} = 6.20\)

\[\frac{\max(s_i)}{\min(s_i)} = \frac{6.54}{5.01} = 1.31\]

Since 1.31 ≤ 2, the equal variance assumption is satisfied

R verification:

# Part (a)
s_a <- c(12.4, 15.8, 11.2, 18.6)
max(s_a) / min(s_a)  # 1.661

# Part (b)
s2_b <- c(25, 81, 36)
s_b <- sqrt(s2_b)
max(s_b) / min(s_b)  # 1.8

# Part (c)
s2_c <- c(28.5, 31.2, 42.8, 25.1, 38.4)
s_c <- sqrt(s2_c)
max(s_c) / min(s_c)  # 1.306

Warning

Do not apply the “≤ 2” rule to variances! The rule of thumb uses the ratio of standard deviations, not variances. If you’re given variances, you must first take square roots to get standard deviations before computing the ratio. For example, in Part (b), the variance ratio is 81/25 = 3.24, but the SD ratio is 9/5 = 1.8.


Exercise 5: Complete Assumption Checking with Plots

Before performing ANOVA inference, we must verify three key assumptions:

  1. Independence (usually justified by study design)

  2. Normality within each group (histograms, QQ-plots)

  3. Equal variances across groups (boxplots, SD ratio)

A quality engineer collects tensile strength measurements (in MPa) from steel samples produced by three different manufacturing processes. The data is stored in a data frame called steel with variables Strength and Process (levels: A, B, C).

  1. Write R code to create side-by-side boxplots with mean points to visually compare the three processes.

  2. Write R code to create an effects plot showing the group means connected by lines.

  3. Write R code to compute the sample size, mean, and standard deviation for each process using tapply().

  4. Write R code to create faceted histograms by process with kernel density curves (red) and normal density overlays (blue).

  5. Write R code to create faceted QQ-plots by process to assess normality within each group.

  6. Check the equal variance assumption using the SD ratio rule of thumb.

  7. Based on typical output from these plots, describe what you would look for to verify each assumption.

Example Output (for reference):

Example effects plot

Fig. 12.6 Effects plot showing group means connected by lines

Example faceted histograms

Fig. 12.7 Faceted histograms with kernel density (red) and normal overlay (blue)

Example faceted QQ-plots

Fig. 12.8 Faceted QQ-plots for assessing normality within each group

Solution

Part (a): Side-by-side boxplots with mean points

library(ggplot2)

# Side-by-side boxplots
ggplot(steel, aes(x = Process, y = Strength)) +
  stat_boxplot(geom = "errorbar", width = 0.3) +
  geom_boxplot(fill = "lightblue") +
  stat_summary(fun = mean, geom = "point",
               color = "black", size = 3) +
  ggtitle("Tensile Strength by Manufacturing Process") +
  xlab("Process") +
  ylab("Tensile Strength (MPa)") +
  theme_minimal()

Part (b): Effects plot

# Effects plot - shows group means connected by lines
ggplot(steel, aes(x = Process, y = Strength)) +
  stat_summary(fun = mean, geom = "point", size = 3) +
  stat_summary(fun = mean, geom = "line", aes(group = 1)) +
  ggtitle("Effects Plot of Tensile Strength by Process") +
  xlab("Process") +
  ylab("Mean Tensile Strength (MPa)") +
  theme_minimal()

Part (c): Summary statistics using tapply()

# Sample sizes
n <- tapply(steel$Strength, steel$Process, length)

# Sample means
xbar <- tapply(steel$Strength, steel$Process, mean)

# Sample standard deviations
s <- tapply(steel$Strength, steel$Process, sd)

# Create summary table
summary_table <- data.frame(
  Process = names(n),
  n = as.numeric(n),
  Mean = round(xbar, 2),
  SD = round(s, 2)
)
print(summary_table)

Part (d): Faceted histograms with density overlays

# Calculate group statistics for normal density overlay
xbar <- tapply(steel$Strength, steel$Process, mean)
s <- tapply(steel$Strength, steel$Process, sd)

# Add normal density column to data frame
steel$normal.density <- mapply(function(value, group) {
  dnorm(value, mean = xbar[group], sd = s[group])
}, steel$Strength, steel$Process)

# Determine number of bins
n_bins <- max(round(sqrt(nrow(steel))) + 2, 5)

# Faceted histograms
ggplot(steel, aes(x = Strength)) +
  geom_histogram(aes(y = after_stat(density)),
                 bins = n_bins, fill = "grey", col = "black") +
  geom_density(col = "red", linewidth = 1) +
  geom_line(aes(y = normal.density), col = "blue", linewidth = 1) +
  facet_wrap(~ Process) +
  ggtitle("Histograms of Tensile Strength by Process") +
  xlab("Tensile Strength (MPa)") +
  ylab("Density") +
  theme_minimal()

Part (e): Faceted QQ-plots

# Add intercept and slope for QQ line (group-specific)
steel$intercept <- xbar[steel$Process]
steel$slope <- s[steel$Process]

# Faceted QQ-plots
ggplot(steel, aes(sample = Strength)) +
  stat_qq() +
  geom_abline(aes(intercept = intercept, slope = slope),
              color = "red", linewidth = 1) +
  facet_wrap(~ Process) +
  ggtitle("QQ Plots of Tensile Strength by Process") +
  xlab("Theoretical Quantiles") +
  ylab("Sample Quantiles") +
  theme_minimal()

Part (f): Equal variance check

# SD ratio rule of thumb
sd_ratio <- max(s) / min(s)
cat("SD ratio:", round(sd_ratio, 4), "\n")
cat("Equal variance assumption satisfied?",
    ifelse(sd_ratio <= 2, "Yes", "No"), "\n")

Part (g): What to look for in each plot

Side-by-side boxplots:

  • Equal variances: Boxes should have similar heights (IQRs) and whisker lengths

  • Separation of means: Look for overlap/separation between groups to predict ANOVA outcome

  • Outliers: Identify potential outliers (points beyond whiskers)

Effects plot:

  • Shows the pattern of group means clearly

  • Useful for predicting whether ANOVA will be significant

  • Large differences in height suggest group means differ

Faceted histograms:

  • Normality: Each histogram should be approximately bell-shaped

  • Kernel density (red) should roughly follow the normal density (blue)

  • Look for severe skewness or multimodality within groups

Faceted QQ-plots:

  • Normality: Points should fall approximately along the diagonal reference line

  • Deviations at tails: Suggest heavy or light tails

  • S-shaped pattern: Suggests skewness

  • Systematic curvature: Suggests non-normality

SD ratio:

  • If max(s)/min(s) ≤ 2, the equal variance assumption is reasonable

  • If ratio > 2, consider alternative methods (Welch’s ANOVA, beyond STAT 350)


Exercise 6: Interpreting Side-by-Side Boxplots

Side-by-side boxplots for three scenarios

Fig. 12.9 Figure 1: Side-by-side boxplots from three different studies comparing k = 4 groups.

For each scenario (A, B, C) shown in Figure 1:

  1. Would you expect to reject \(H_0\) in the ANOVA F-test? Explain your reasoning based on the overlap of boxplots.

  2. Are there any concerns about the equal variance assumption?

  3. Are there any potential outliers that might affect the analysis?

Solution

Scenario A:

  1. Likely to reject H₀. The boxplots show minimal overlap—Group 4 is clearly separated from Groups 1-3, and Group 1 appears lower than Groups 2-3. The between-group variability appears large relative to within-group variability.

  2. Equal variance appears reasonable. All four boxplots have similar heights (IQR) and whisker lengths, suggesting comparable spread across groups.

  3. One potential outlier visible in Group 2 (marked with a point above the upper whisker). Should investigate this observation.

Scenario B:

  1. Unlikely to reject H₀. The boxplots show substantial overlap—all four groups span similar ranges with medians close together. The between-group variability appears small relative to within-group variability.

  2. Equal variance appears reasonable. The spreads are similar across all groups.

  3. No obvious outliers visible.

Scenario C:

  1. May or may not reject H₀. Some separation exists (Groups 1 and 4 appear different), but there’s also substantial overlap among some groups. The result could go either way.

  2. Potential concern about equal variance. Group 3 appears to have much larger spread (taller box, longer whiskers) than the other groups. Should check the ratio of standard deviations.

  3. Potential outlier in Group 1 (below lower whisker).

Key insight: Visual assessment from boxplots helps predict ANOVA results:

  • Large separation between boxes with small within-group spread → likely significant

  • Substantial overlap with similar spreads → likely not significant

  • Always check assumptions before drawing formal conclusions


Exercise 7: Why “Analysis of Variance”?

A common source of confusion is why a test for comparing means is called Analysis of Variance.

  1. Explain in your own words why comparing variabilities (between-group vs. within-group) helps us determine whether population means differ.

  2. Consider two scenarios with the same group sample means: \(\bar{x}_1 = 10\), \(\bar{x}_2 = 15\), \(\bar{x}_3 = 20\). In Scenario I, each group has \(s = 2\). In Scenario II, each group has \(s = 8\). Which scenario provides stronger evidence that the population means differ? Explain.

  3. If \(H_0: \mu_1 = \mu_2 = ... = \mu_k\) is true, what would you expect the ratio MSA/MSE to be approximately equal to? Why?

Solution

Part (a): Conceptual explanation

When comparing group means, the observed differences \(\bar{x}_i - \bar{x}_j\) could arise from:

  1. True differences in population means (what we want to detect)

  2. Random sampling variability (noise)

To determine if observed differences are “real,” we must compare them to what we’d expect from random variation alone. This is exactly what ANOVA does:

  • Between-group variability (MSA) captures how much the group means vary from the overall mean

  • Within-group variability (MSE) captures the baseline “noise” level within each group

If the between-group variability is much larger than expected from noise alone, we conclude the means truly differ.

Part (b): Comparing scenarios

Scenario I (s = 2) provides much stronger evidence that population means differ.

  • In Scenario I, the spread within each group is small (s = 2), so the observed differences of 5 units between consecutive means is large relative to the noise.

  • In Scenario II, the spread within each group is large (s = 8), so the same 5-unit differences could easily be due to random sampling variability.

Think of it as a signal-to-noise ratio: same “signal” (mean differences) but different “noise” levels leads to different conclusions.

Part (c): Expected F-ratio under H₀

Under \(H_0\), we expect \(F = \frac{MSA}{MSE} \approx 1\).

This is because:

  • When all population means are equal, MSA estimates σ² (with some sampling variability)

  • MSE always estimates σ² (regardless of whether H₀ is true)

  • The ratio of two quantities both estimating σ² should be approximately 1

Large values of F (substantially greater than 1) suggest MSA is estimating something larger than σ², which happens when population means differ.


Exercise 8: The Multiple Testing Problem

A researcher wants to compare mean response times across k = 5 different algorithm implementations.

Family-wise error rate inflation

Fig. 12.10 Figure: Family-wise error rate (FWER) increases rapidly with the number of groups when using multiple t-tests at α = 0.05.

  1. How many pairwise comparisons would be needed to compare all pairs of algorithms?

  2. If the researcher performs each pairwise comparison as a separate two-sample t-test at α = 0.05, what is the probability of making at least one Type I error (assuming all null hypotheses are true and tests are independent)?

  3. Why does ANOVA provide a better approach than multiple t-tests?

  4. For k = 10 groups, how many pairwise comparisons are needed? What would the family-wise error rate be if using individual α = 0.05 tests?

Solution

Part (a): Number of comparisons for k = 5

\[c = \binom{k}{2} = \binom{5}{2} = \frac{5!}{2! \cdot 3!} = \frac{5 \times 4}{2} = 10 \text{ comparisons}\]

Part (b): Family-wise error rate

Assuming independence, the probability of at least one Type I error is:

\[\alpha_{overall} = 1 - (1 - \alpha_{single})^c = 1 - (1 - 0.05)^{10} = 1 - 0.95^{10}\]
\[= 1 - 0.5987 = 0.4013 \approx 40.1\%\]

Even though each individual test has only a 5% false positive rate, the overall probability of making at least one false positive is about 40%!

Part (c): Advantages of ANOVA

  1. Controls Type I error: ANOVA tests all means simultaneously with a single test at the specified α level

  2. Efficiency: One test instead of many

  3. Proper framework: If ANOVA is significant, we can then use multiple comparison procedures (Tukey, Bonferroni) that properly control the family-wise error rate

  4. Uses pooled variance: More efficient estimation of σ² by combining information from all groups

Part (d): For k = 10 groups

Number of comparisons:

\[c = \binom{10}{2} = \frac{10 \times 9}{2} = 45 \text{ comparisons}\]

Family-wise error rate:

\[\alpha_{overall} = 1 - (0.95)^{45} = 1 - 0.0994 = 0.9006 \approx 90\%\]

With 45 separate tests, there’s about a 90% chance of at least one false positive—making the approach essentially meaningless for controlling errors.

R verification:

# Part (a) and (b)
k <- 5
c <- choose(k, 2)  # 10
alpha_overall <- 1 - (1 - 0.05)^c  # 0.4013

# Part (d)
k <- 10
c <- choose(k, 2)  # 45
alpha_overall <- 1 - (1 - 0.05)^c  # 0.9006

Exercise 9: True/False Conceptual Questions

Determine whether each statement is True or False. Provide a brief justification.

  1. In ANOVA, if we reject \(H_0\), we can conclude that all population means are different from each other.

  2. The ANOVA F-test requires that all group sample sizes be equal.

  3. If the boxplots for all groups show similar medians but very different spreads, the equal variance assumption may be violated.

  4. The alternative hypothesis in one-way ANOVA can be directional (e.g., \(\mu_1 < \mu_2 < \mu_3\)).

  5. If sample sizes are large (n > 40 per group), ANOVA is robust to moderate departures from normality.

  6. ANOVA with k = 2 groups is equivalent to a two-sample t-test under certain conditions.

Solution

1. False

Rejecting \(H_0\) only tells us that at least one population mean differs from the others. It could be that only one mean is different, or some are different, or all are different. Multiple comparison procedures are needed to identify which specific means differ.

2. False

ANOVA does not require equal sample sizes (balanced design). The formulas accommodate unequal \(n_i\) values. However, equal sample sizes provide some advantages: simpler calculations, more robust to assumption violations, and optimal power.

3. True

The equal variance assumption requires \(\sigma_1^2 = \sigma_2^2 = ... = \sigma_k^2\). Different spreads in boxplots suggest this assumption may be violated. We check formally using the ratio rule: \(\max(s_i)/\min(s_i) \leq 2\).

4. False

The ANOVA F-test is inherently non-directional (two-sided). The alternative hypothesis is always “at least one mean differs”—we cannot specify the direction or pattern of differences in the standard ANOVA framework.

5. True

Like the t-test, ANOVA is robust to moderate violations of the normality assumption when sample sizes are large, due to the Central Limit Theorem. The F-test statistic’s distribution is less affected by non-normality when n is large.

6. True

When k = 2, ANOVA is equivalent to a two-sided pooled two-sample t-test with \(\Delta_0 = 0\). Specifically, \(F_{TS} = t_{TS}^2\) and the p-values are identical. This equivalence holds only for the equal variance (pooled) case and two-sided alternative.


Exercise 10: Designing an ANOVA Study

A biomedical engineering team wants to compare the effectiveness of 4 different stent designs in maintaining arterial blood flow. They plan to conduct an experiment using a laboratory flow simulation system.

  1. Identify the factor variable and its levels.

  2. Identify an appropriate quantitative response variable.

  3. Write the null and alternative hypotheses.

  4. What sample size considerations should the team address?

  5. How could the team ensure the independence assumption is satisfied?

  6. What potential issues might arise with the equal variance assumption, and how could they check it?

Solution

Part (a): Factor variable and levels

  • Factor: Stent design

  • Levels: Design A, Design B, Design C, Design D (k = 4 levels)

  • This is a categorical variable with 4 categories

Part (b): Response variable

Several quantitative response variables could be appropriate:

  • Mean blood flow rate (mL/min)

  • Flow resistance (pressure drop per unit flow)

  • Flow uniformity index

  • Maximum flow velocity achieved

The choice depends on which aspect of “effectiveness” is most clinically relevant.

Part (c): Hypotheses

Let \(\mu_i\) = true mean blood flow rate for Stent Design i, for i = A, B, C, D.

\[\begin{split}&H_0: \mu_A = \mu_B = \mu_C = \mu_D\\ &H_a: \text{At least one } \mu_i \text{ is different from the others}\end{split}\]

Part (d): Sample size considerations

  • Power: Larger samples increase power to detect true differences

  • Effect size: If differences between stents are expected to be small, larger samples are needed

  • Variability: Higher within-group variability requires larger samples

  • Practical constraints: Cost, time, and availability of simulation resources

  • Balance: Equal sample sizes per group are ideal but not required

  • Rule of thumb: At least 10-20 observations per group for adequate power

Part (e): Ensuring independence

  • Use a new simulation setup for each trial (don’t reuse configurations)

  • Randomize the order in which stent designs are tested to avoid systematic effects

  • Ensure different arterial models or simulation runs for each measurement

  • Avoid having the same researcher consistently test the same stent design (potential bias)

  • Check for time-dependent effects (e.g., equipment drift) and randomize to mitigate

Part (f): Equal variance concerns

Potential issues:

  • Different stent designs might produce inherently different variability in flow

  • Manufacturing precision might vary by design complexity

  • Some designs might perform consistently while others show variable results

How to check:

  • Calculate sample standard deviation for each group

  • Apply the rule of thumb: \(\max(s_i)/\min(s_i) \leq 2\)

  • Examine side-by-side boxplots for similar spreads

  • If violated, consider data transformation or Welch’s ANOVA (beyond STAT 350)


12.1.5. Additional Practice Problems

True/False Questions (1 point each)

  1. The ANOVA F-test can determine which specific group means are different from each other.

    Ⓣ or Ⓕ

  2. If all sample means are identical (\(\bar{x}_1 = \bar{x}_2 = ... = \bar{x}_k\)), the F-test statistic will equal exactly 0.

    Ⓣ or Ⓕ

  3. The notation \(\bar{x}_{3.}\) represents the mean of the third observation across all groups.

    Ⓣ or Ⓕ

  4. ANOVA assumes that each group is sampled independently from its respective population.

    Ⓣ or Ⓕ

  5. If an ANOVA study has k = 6 groups with 10 observations each, the total sample size is n = 60.

    Ⓣ or Ⓕ

  6. The equal variance assumption can be checked by computing the ratio of the largest to smallest sample standard deviations.

    Ⓣ or Ⓕ

Multiple Choice Questions (2 points each)

  1. In ANOVA notation, \(x_{2,7}\) represents:

    Ⓐ The 2nd observation in the 7th group

    Ⓑ The 7th observation in the 2nd group

    Ⓒ The product of 2 and 7

    Ⓓ The sum of observations in groups 2 and 7

  2. For a study comparing k = 4 groups, the alternative hypothesis is:

    \(H_a: \mu_1 \neq \mu_2 \neq \mu_3 \neq \mu_4\)

    \(H_a: \mu_1 = \mu_2 = \mu_3 = \mu_4\)

    \(H_a:\) At least one \(\mu_i\) is different from the others

    \(H_a: \mu_1 > \mu_2 > \mu_3 > \mu_4\)

  3. Which is NOT an assumption of one-way ANOVA?

    Ⓐ Independent random samples from each population

    Ⓑ Normal distributions (or large samples)

    Ⓒ Equal population variances

    Ⓓ Equal sample sizes from each population

  4. The rule of thumb for checking equal variances requires:

    \(\max(s_i^2)/\min(s_i^2) \leq 2\)

    \(\max(s_i)/\min(s_i) \leq 2\)

    \(\max(\bar{x}_i)/\min(\bar{x}_i) \leq 2\)

    Ⓓ All sample variances within 2 units of each other

  5. How many pairwise comparisons are possible with k = 7 groups?

    Ⓐ 7

    Ⓑ 14

    Ⓒ 21

    Ⓓ 49

  6. The overall sample mean \(\bar{x}_{..}\) is calculated as:

    Ⓐ The simple average of group means

    Ⓑ A weighted average of group means, weighted by sample sizes

    Ⓒ The median of all observations

    Ⓓ The sum of all group means

Answers to Practice Problems

True/False Answers:

  1. False — The F-test only determines if at least one mean differs; multiple comparison procedures identify which specific means differ.

  2. True — If all \(\bar{x}_{i.} = \bar{x}_{..}\), then SSA = 0, so MSA = 0, and F = 0/MSE = 0.

  3. False\(\bar{x}_{3.}\) represents the mean of Group 3 (the dot replaces the j subscript, indicating averaging over all observations in that group).

  4. True — Independence within and between samples is a fundamental ANOVA assumption.

  5. True — Total sample size n = 6 × 10 = 60.

  6. True — We check max(s)/min(s) ≤ 2 using sample standard deviations. If the ratio is at most 2, the equal variance assumption is considered reasonable.

Multiple Choice Answers:

  1. — First subscript is group, second is observation within group.

  2. — The alternative states at least one mean differs; we don’t specify which or how.

  3. — Equal sample sizes are NOT required for ANOVA; unbalanced designs are allowed.

  4. — The rule uses standard deviations, not variances or means.

  5. \(\binom{7}{2} = \frac{7 \times 6}{2} = 21\)

  6. \(\bar{x}_{..} = \frac{\sum n_i \bar{x}_{i.}}{n}\), a weighted average with weights proportional to sample sizes.