Slides 📊

8.2. Experimental Design Principles

To produce valid and meaningful results, an experiment must be designed according to a set of fundamental principles. These principles guide how to control variability, reduce bias, and ensure the integrity of the conclusions.

Road Map 🧭

Recognize the different building blocks of an experiment: experimental units, factors, levels, treatment, and response.
Understand that a well-designed experiment must follow the three key principles: Control, Replication, and Randomization.

8.2.1. The Language of Experimental Design

Before exploring the principles themselves, let us establish the vocabulary for the different components of an experiment.

Experimental Units and Subjects

Experimental units are the objects or entities being studied in an experiment—the things to which treatments are applied and from which responses are measured. When these units happen to be human beings, we call them experimental subjects or simply subjects.

Factors, Levels, and Treatments

Factors are the variables that are hypothesized to be associated with the response variable. They are also called the independent or explanatory variables. Each factor can take on different values called levels. Think of factors as categorical variables where each category represents a different setting or condition we want to test.

A treatment represents a specific combination of factor levels that is applied to an experimental unit.

Response Variables

The response variable (also called the dependent variable) is what we measure to assess the effect of our treatments. This is the outcome we believe might be influenced by the factors.

Statistical Significance

If the response has a more pronounced structure than you would expect from only random chance, we call the result statistically significant.

Example 💡: Crop Yield Study 🌾

Consider a study investigating how different agricultural practices affect crop yield. The study examines the following conditions:

Fertilizer: Type 1, Type 2
Water quantity: 0.2, 0.4, 0.6, 0.8, 1.0 gallons per square foot
Vitamins: Brand A, Brand B, Brand C
Pesticides: Combo 1, Combo 2, Combo 3

Describe the experimental units, factors, levels, treatments, and the reponse variable in this experiment. What would be an example of a statistically significant result from this experiment?

Experimental Units

The experimental units are the individual crop plots. Each plot receives a treatment then produces a crop yield as a response.

Factors, levels, and treatments

There are four factors in this experiment: fertilizer, water quantity, vitamins, and pesticides.
The different levels of the factor fertilizer is Type 1 and Type 2. (The listing of the levels of other factors are ommitted for conciseness).
There are \(2 \times 5 \times 3 \times 3 = 90\) possible treatments. An example of a specific treatment is “Fertilizer Type 2, 0.8 gallons per sq ft of water, Vitamins from Brand A, and Pesticides Combo 1.”

Response

The response variable is the crop yield, measured in bushels per acre, for example.

Statistical Significance

If the crop yield is much higher for a single type of pesticide than others, than this would be considered a statistically significant result.

8.2.2. The Three Principles of Well-Designed Experiments

Principle 1: Control

Imagine testing a new fertilizer and observing that plants grow to an average height of 24 inches. Is this good or bad? Without a baseline for comparison, it is impossible to say. The first principle requires establishing conditions that allow for an objective comparison of results.

First priniciple of experimental design: control — Fig. 8.1 Principle 1: Control

The term “control” is used for two different purposes:

First, a well-designed experiment should include a control group whenever applicable.
Second, the experiment should maintain control over the environment so that all treatment groups experience comparable conditions, differing only in the treatment they receive.

Let us examine what these statements mean in further detail.

A. Control Group as Baseline

A control group serves as the standard against which we measure treatment effects. This group receives either no treatment at all or a standard “status quo” treatment that represents current practice.

It is possible that a control group is not a feasible option in certain experiments. In such cases, the experiment must include at least two distinct treatment groups.

The Placebo Effect and Blinding

The placebo effect is a medical phenomenon in which people experience real physiological or psychological changes simply because they believe they’re receiving treatment, even when the “treatment” is inert.

In medical experiments, the control group is designed to account for this. A placebo is a dummy treatment designed to be indistinguishable from the real treatment but lacking the key ingredient. Sugar pills that look identical to medication, saline injections that feel like real injections, or sham procedures that mimic real surgeries all serve as placebos.

To prevent further confounding of the placebo effect or any other subjective belief, a techinque called blinding is also used.

In single-blind experiments, either subjects or researchers are kept unaware of group assignments, but not both. This might be used when it is impossible to hide the treatment from researchers (such as surgical procedures) but subjects can remain unaware of which specific treatment they received.
In double-blind experiments, both subjects and researchers are kept unaware of group assignments. This represents the gold standard for eliminating bias, as it prevents both subject expectations and researcher expectations from influencing results.

Double-blinding is particularly important when:

Outcomes are subjective or require researcher judgment.
Researchers have strong expectations about which treatment should work.
Subjects’ knowledge of their treatment could affect their behavior or reporting.

B. Maintaining Control Over Experimental Environment for Comparable Conditions

For different treatment groups to provide valid comparisons, they must be treated identically in every way except for the specific treatment being tested. Any systematic difference in how groups are treated (other than the treatment itself) can confound the results.

Blocking: Advanced Control for Known Confounders

Sometimes we know that certain characteristics of the experimental units will strongly influence the response, even though these characteristics aren’t what we want to study. These extraneous variables are sometimes controlled by grouping experimental units into blocks based on similar extraneous characteristics, then performing the experiment within each block. By comparing experimental results in a block of shared characteristics, we remove the possibility that any patterns in the results are due to the extraneous variable.

Blocking is explored in more detail in Section 8.3.

Principle 2: Randomization

Many variables can influence experimental outcomes—some we know about, others we don’t, and still others we can’t easily measure or control. If experimental units with certain characteristics systematically end up in certain treatment groups, we can’t tell whether observed differences are due to treatments or due to these underlying characteristics.

Second priniciple of experimental design: randomization — Fig. 8.2 Principle 2: Randomization

To prevent this, we must randomly assign experimental units to a treatment group. Randomization means using chance—not human judgment, convenience, or any other systematic method—to assign experimental units to treatment groups. When done properly, across many possible randomizations, each treatment group will have the same distribution of relevant characteristics. While any single randomization might produce some imbalance, there’s no systematic bias toward any particular pattern.

Example 💡: Random Assignment of Units to Treatment Groups

Suppose we have 125 participants to randomize into one control group and three treatment groups (four groups total). A simple randomization procedure might work as follows:

Create a master list of all 125 participants.
Use a randomization device (like a four-sided die) to assign each participant:
- Roll 1 = Control group
- Roll 2 = Treatment 1
- Roll 3 = Treatment 2
- Roll 4 = Treatment 3

Continue until all participants are assigned.

This procedure gives each participant an equal probability of ending up in any group, regardless of their characteristics.

Principle 3: Replication

Statistical variation is inevitable in experimental data. Even when treatments have real effects, individual responses will vary due to natural differences between experimental units, measurement error, and random environmental factors. With small samples, this variation can easily mask true treatment effects or create the appearance of effects where none exist.

Third priniciple of experimental design: replication — Fig. 8.3 Principle 3: Replication

Replication means using enough experimental units within each treatment group so that individual variation averages out, revealing the underlying treatment effects. Independence between measurements is crucial here. Ten measurements from the same experimental unit (like taking a patient’s blood pressure ten times) don’t provide the same information as one measurement each from ten different experimental units.

With adequate replication,

Individual outliers have less influence on group averages.
Treatment effects become distinguishable from random fluctuations.
Statistical power increases, making it easier to detect real effects when they exist.

8.2.3. Bringing It All Together

Key Takeaways 📝

Three principles are essential: Control, Randomization, and Replication should be sufficiently satisfied for validity of an experimental study.
Control provides the basis for comparison through control groups and standardized conditions.
Randomization creates comparable groups by using chance to assign treatments, eliminating systematic bias and enabling statistical inference.
Replication ensures reliable conclusions by providing enough observations to distinguish genuine effects from random variation.

It’s important to recognize that no study is ever perfect. Real-world constraints always require compromises. The goal is not perfection but rather ensuring that all three principles are satisfied well enough to support reliable conclusions.

As we continue through this chapter, we’ll explore specific experimental designs that implement these principles in different research contexts, common problems that arise when principles are violated, and practical strategies for designing studies that produce reliable, actionable results.

Exercises

Identifying Principle Violations: For each scenario, identify which experimental design principle is being violated and explain the potential consequences:
1. A researcher tests a new teaching method by using it in morning classes and comparing results to evening classes using traditional methods.
2. A medical study assigns the first 50 volunteers to the treatment group and the next 50 to the control group.
3. An agricultural experiment tests a new fertilizer using only 3 plots for treatment and 3 plots for control.
4. A psychology study tests an intervention but forgets to include a control group entirely.
The Importance of Control Groups: A researcher claims that a new study method improves test scores because students using the method averaged 78% on the final exam. Explain why this conclusion is not justified and describe what additional information would be needed.
Blinding in Practice: For each research scenario, determine whether single-blind, double-blind, or no blinding is feasible, and explain your reasoning:
1. Testing whether a new surgical technique reduces recovery time.
2. Comparing the effectiveness of two different pain medications.
3. Evaluating whether a new teaching method improves learning.
4. Testing whether a new fertilizer increases crop yield.
Blocking Design: An educational researcher wants to test whether a new curriculum improves math achievement. The study will include students from three different grade levels (3rd, 4th, and 5th grade).
1. Explain why blocking by grade level might be important.
2. Compare this approach to simple randomization across all students.
Real-World Constraints: Consider testing a new traffic light system to reduce accidents at intersections. Identify practical constraints that might make it difficult to implement each of the three principles perfectly, and suggest realistic compromises that maintain the study’s validity.