8.1. Experimental and Sampling Designs

We’ve explored data through descriptive methods, built probability models to understand uncertainty, and learned how sample statistics behave through sampling distributions and the Central Limit Theorem. These tools have equipped us to understand how information flows from populations to samples. But before we can confidently make the reverse journey—inferring from samples back to populations—we need one final, essential piece: understanding how to collect data properly in the first place. This chapter focuses on thoughtful, principled approaches to study design and data collection. Statistical inference is powerful, but it’s only as reliable as the data it’s based on.

Road Map 🧭

  • Identify the characteristics of a statistical question.

  • Understand that data contains statistical variation arising from different sources, and that each source is treated differently in an experiment.

  • Recognize the different sources of data, along with their respective advantages and disadvantages.

  • Distinguish between observational and experimental studies.

8.1.1. Statistical Questions and the Need for Data

Our journey toward statistical inference begins with recognizing what makes a question statistical in nature.

Statistical vs. Deterministic Questions

Not every question involving data is statistical. If we can predict an outcome with certainty given the inputs—like calculating the area of a rectangle from its length and width—we’re dealing with a deterministic relationship. Statistical questions, by contrast, involve relationships where perfect prediction is impossible, but where we can still identify meaningful patterns and quantify uncertainty by analyzing data.

A data always comes with inherent variability. This variability isn’t always a flaw to be eliminated—it’s often the fundamental characteristic that makes statistical methods both necessary and powerful.

The Nature of Statistical Variation

Statistical variation arises from multiple sources.

  • Subject differences: Individual units in a study naturally vary in their characteristics, responses, and behaviors.

  • Measurement errors: Even the most precise instruments introduce some degree of measurement uncertainty.

  • Random chance: Inherent randomness in natural processes and human behavior.

Ideally, we want the primary source of variation in our data to be random chance, with the influence from other sources minimized through careful design. We aim to reduce measurement errors through precise instruments and standardized procedures, and we handle subject differences through proper randomization and control strategies.

8.1.2. The Spectrum of Data Sources

Before conducting any study, researchers face a fundamental decision: where will their data come from? This choice shapes everything that follows—the types of conclusions that can be drawn, the statistical methods that are appropriate, and the confidence they can place in their results. Understanding the characteristics, advantages, and limitations of different data sources is essential for making informed research decisions.

Anecdotal Data

Anecdotal data represents the most basic form of information—observations from personal experiences, casual reports, or informal accounts shared through news media and social networks. While lacking scientific rigor, anecdotal evidence plays a surprisingly important role in the research ecosystem by providing insights or raising hypotheses to be studied further.

Available Data

The modern research landscape is dominated by an unprecedented availability of existing data. Available data includes any information that has already been collected and can be accessed for research purposes, ranging from government statistics and published study datasets to corporate databases and social media archives.

However, we do not have control over the quality, accuracy, and completeness of an available data, which can impact the reliability of the results and insights derived from it. It is important to asseess the quality of the available data carefully by considering data sources, data collection methods, and data processing techniques used.

Collecting New Data

When available data is insufficient or inappropriate for answering our research question, new data must be collected. Studies involving new data collection can be classified into two major branches:

  • Observational studies

  • Experimental studies

In the remainder of this section, we will briefly describe the characteristics of each and highlight their differences. Because experimental studies require greater and more deliberate researcher intervention, their details will be discussed further in Sections 8.2 through 8.5.

8.1.3. Observational Studies

In an observational study, researchers act as careful observers rather than active manipulators. They identify subjects of interest, contact them, and collect measurements, but they do not impose treatments or attempt to influence the study environment. This approach is particularly valuable when interventions would be unethical, impractical, or impossible, or when the research goal is to understand naturally occurring phenomena.

The observational approach follows a systematic process:

  1. Define the research question.

  2. Identify the target population.

  3. Specify variables of interest, including both the primary variables of focus and potential confounding variables that might influence the results.

  4. Design and implement random sampling procedures to obtain a representative sample from the target population.

  5. Observe and measure the variables of interest without intervention.

  6. Apply statistical inference methods to draw conclusions about the broader population.

Strengths and Limitations of Observational Studies

Observational studies excel at documenting naturally occurring relationships and patterns. They allow researchers to study phenomena in realistic settings where all the complex factors that influence outcomes in the real world remain present. This ecological validity makes observational studies particularly valuable for understanding how variables relate in natural environments.

A key limitation, however, is the lack of control over variables and treatment assignments. Without the ability to regulate which factors are present or how they vary, researchers cannot isolate the effects of specific conditions. As a result, the influence of other, uncontrolled factors may be difficult to separate from the patterns of interest.

This limitation does not reduce the value of observational studies; it simply calls for careful interpretation. Patterns observed consistently across multiple well-designed observational studies can still provide strong insights into how phenomena unfold in realistic contexts.

Example 💡: The Case of “Feline High-rise Syndrome” 🐈

Consider the study of “feline high-rise syndrome” by Whitney and Mehlhaff, published in the Journal of the American Veterinary Medical Association in 1987. The researchers wondered: when cats fall from buildings, how does the height of the fall relate to the severity of their injuries? To investigate this question, they identified 132 cats that had been brought to the Animal Medical Center in New York City after falling from multi-story buildings between June and November 1984. For each case, they carefully documented the cat’s injuries, the height from which it fell, and the outcome of treatment.

Their findings were surprising. About 90% of the cats survived their falls with appropriate veterinary care. But more intriguingly, they observed that cats falling from seven stories or higher didn’t sustain significantly more injuries than those falling from lower heights. In fact, cats falling from very high stories (nine floors or more) showed remarkably few limb fractures compared to those falling from intermediate heights.

The researchers proposed what they called the “terminal velocity hypothesis” to explain this pattern. They theorized that cats reach their maximum falling speed after about five stories. Once they achieve this terminal velocity and realize they’re in for a long fall, cats may relax into a “flying squirrel” posture that distributes impact forces more evenly across their body, reducing the likelihood of concentrated injuries.

Why This Had to Be Observational

This study illustrates why observational research is sometimes the only ethical option. Testing the terminal velocity hypothesis experimentally would require deliberately dropping cats from various heights—an approach that would be both unethical and illegal. Even if researchers could design some sort of controlled falling scenario with safety nets or other protections, such an artificial setup would fundamentally change the phenomenon being studied. Instead, the researchers had to rely on cats’ own decisions to fall from high ledges, windowsills, and fire escapes.

8.1.4. Experimental Studies

When ethical and practical constraints allow, experimental studies offer the strongest framework for statistical investigation. In contrast to observational studies, experiments involve deliberate control over one or more variables, including the ability to assign treatments or conditions to subjects according to a planned design. This control enables researchers to minimize the influence of extraneous factors and ensure that differences in outcomes can be more confidently attributed to the conditions under study.

The following sections will expand on this topic in greater detail.

8.1.5. Bringing It All Together

Key Takeaways 📝

  1. Statistical questions require data with inherent variability and seek to quantify relationships among variables.

  2. Data sources vary in quality and appropriateness. Anecdotal data provides inspiration but not evidence; available data offers efficiency but requires careful quality assessment; new data collection provides control but demands resources.

  3. Observational studies are valuable for studying naturally occurring phenomena where intervention is impossible or unethical.

  4. Experimental studies are appropriate when variables and environmental factors can be actively controlled.

  5. The choice between observational and experimental approaches depends on research goals, ethical considerations, and practical constraints.

  6. Study design determines the scope of valid conclusions.

8.1.6. Exercises

These exercises develop your understanding of statistical questions, data sources, and the distinction between observational and experimental studies.

Key Concepts

Statistical vs. Deterministic Questions

  • Deterministic: Outcomes can be predicted with certainty given inputs (e.g., calculating area from length × width)

  • Statistical: Perfect prediction is impossible; we identify patterns and quantify uncertainty using data

Sources of Statistical Variation

  • Subject differences (natural variation among individuals)

  • Measurement errors (instrument precision)

  • Random chance (inherent randomness)

Data Sources (from weakest to strongest evidence)

  1. Anecdotal data: Personal experiences, informal accounts — useful for generating hypotheses

  2. Available data: Pre-existing datasets — efficient but quality not controlled

  3. New data collection: Observational or experimental studies

Study Types

  • Observational study: Researchers observe without intervention; cannot establish causation

  • Experimental study: Researchers actively assign treatments; can establish causation

Common Student Error ⚠️

A statistical question is NOT defined by “containing probability” — it is defined by variability in the answer across units or repetitions. The question “What is the probability of heads?” is deterministic (answer: 0.5). The question “How many heads will I get in 10 flips?” is statistical (answer varies each time you flip).


Exercise 1: Statistical vs. Deterministic Questions

Classify each question as statistical or deterministic, and explain your reasoning.

  1. What is the fuel efficiency (mpg) of a 2024 Toyota Camry traveling at 60 mph?

  2. How many lines of code are in the file main.py if it contains 45 functions averaging 12 lines each?

  3. Does caffeine consumption affect reaction time in software developers?

  4. What is the probability that a randomly selected engineering student has an internship?

  5. How long will it take to transfer a 500 MB file over a 100 Mbps connection?

  6. Do mechanical engineering students score higher on the FE exam than civil engineering students?

Solution

Part (a): Statistical

Even for the same car model at the same speed, fuel efficiency varies due to road conditions, weather, driving style, tire pressure, and engine condition. The advertised mpg is an average, not a guaranteed value.

Part (b): Deterministic

This is a simple multiplication: 45 × 12 = 540 lines. Given the exact inputs, the answer is certain.

Part (c): Statistical

Different people respond differently to caffeine. We can study the average effect and quantify the variability in responses, but we cannot predict exactly how any individual will respond.

Part (d): Statistical

The proportion varies depending on which students we sample, when we ask, and how “internship” is defined. This requires data collection and involves uncertainty.

Part (e): Deterministic (idealized) / Statistical (realistic)

In theory, ignoring overhead and using standard conventions (1 byte = 8 bits): 500 MB = 4000 Mb, so transfer time = 4000 Mb ÷ 100 Mbps = 40 seconds. Given exact inputs and ideal conditions, the answer is certain.

In practice, actual transfer times vary due to network congestion, protocol overhead, packet loss, and other factors — making this statistical in real-world applications. This illustrates that “deterministic” often depends on idealizing assumptions.

Part (f): Statistical

Exam scores vary among students even within the same major. We need data to compare the distributions and assess whether observed differences are meaningful or due to chance.


Exercise 2: Sources of Statistical Variation

For each scenario, identify which source(s) of variation are most prominent: subject differences, measurement error, or random chance.

  1. Blood pressure readings taken from the same patient at different times of day show different values.

  2. Two engineers using the same stress testing equipment on identical steel samples get slightly different yield strength measurements.

  3. Customer arrival times at a help desk vary unpredictably throughout the day.

  4. Students taking the same exam under identical conditions receive widely different scores.

  5. A quality control sensor occasionally misreads the diameter of manufactured parts.

Solution

Part (a): Within-person physiological variability + Measurement error

Blood pressure naturally fluctuates due to activity level, stress, hydration, and circadian rhythms (within-person variability over time). There may also be measurement error from the device. Both sources contribute to the different readings.

Part (b): Measurement error

The engineers and samples are held constant; the variation comes from instrument precision and small procedural differences in equipment operation.

Part (c): Random chance

Customer arrivals follow stochastic processes. Individual decisions to seek help are independent and unpredictable, creating inherent randomness.

Part (d): Subject differences

Students have different knowledge levels, preparation, test-taking skills, and cognitive abilities. This is primarily individual variation.

Part (e): Measurement error

The sensor’s occasional misreadings represent instrument error, potentially due to calibration drift, environmental factors, or electronic noise.


Exercise 3: Evaluating Data Sources

A tech company wants to understand whether their new IDE (integrated development environment) improves programmer productivity.

For each proposed data source below, identify the type of data source and discuss its strengths and limitations.

  1. A senior developer shares that “the new IDE feels much faster” and mentions a colleague who “finished a project ahead of schedule after switching.”

  2. The company analyzes Git commit data from the past year, comparing commits per developer before and after the IDE was released.

  3. The company recruits 100 programmers and randomly assigns 50 to use the new IDE and 50 to continue with the old IDE for a month, then compares lines of code produced.

Solution

Part (a): Anecdotal data

Strengths: Easy to collect; may generate hypotheses worth investigating; captures qualitative impressions.

Limitations: No systematic evidence; based on individual perceptions which may be biased; the colleague’s success could be due to many other factors; not representative of all users.

Part (b): Available data

Strengths: Large sample size; objective measure (commits); no additional data collection cost; captures real-world behavior.

Limitations: Many confounding factors (project difficulty, team changes, deadlines); commit frequency ≠ productivity; self-selection (who adopted early?); no control over data quality; cannot establish causation.

Part (c): Experimental study (new data collection)

Strengths: Random assignment controls for confounding variables; direct comparison under similar conditions; can establish causal relationship.

Limitations: Artificial setting may not reflect real productivity; “lines of code” is an imperfect productivity measure; one month may be too short to see learning effects; participants know they’re being studied (Hawthorne effect).


Exercise 4: Observational vs. Experimental Studies

For each research question, determine whether an observational or experimental study would be more appropriate. Explain your reasoning, considering ethical and practical constraints.

  1. Does wearing a motorcycle helmet reduce the severity of head injuries in accidents?

  2. Does a new compiler optimization flag improve code execution speed?

  3. Are children who play video games more likely to have attention problems?

  4. Does a new drug reduce blood pressure more effectively than the current standard treatment?

  5. Do employees who work remotely report higher job satisfaction?

Solution

Part (a): Observational study

Reasoning: We cannot ethically assign some motorcyclists to not wear helmets and then have them get into accidents. Researchers must observe riders who naturally choose to wear or not wear helmets and compare injury outcomes when accidents occur.

Part (b): Experimental study

Reasoning: There are no ethical constraints — we can randomly assign code samples to be compiled with or without the optimization flag. This allows controlled comparison of execution times.

Part (c): Observational study

Reasoning: We cannot ethically force children to play (or not play) video games for extended periods. We must observe children’s natural gaming habits and assess attention, while trying to account for confounding factors (parenting style, socioeconomic status, pre-existing attention issues).

Part (d): Experimental study

Reasoning: With proper ethical approval and informed consent, patients can be randomly assigned to receive either the new drug or the standard treatment. This is the gold standard for evaluating medical interventions.

Part (e): Observational study (typically)

Reasoning: While not unethical, it’s usually impractical to randomly assign employees to work locations. Most studies observe existing remote vs. in-office workers. However, some companies have conducted experimental studies during policy changes.


Exercise 5: The Feline High-Rise Study Revisited

The chapter describes a study of “feline high-rise syndrome” where researchers examined cats brought to a veterinary hospital after falling from buildings.

  1. Why was this necessarily an observational study?

  2. The researchers found that cats falling from 7+ stories didn’t sustain more injuries than those falling from lower heights. What alternative explanations (besides the “terminal velocity hypothesis”) might account for this finding?

  3. What is a key limitation of using only cats brought to the veterinary hospital as the data source?

  4. If you wanted to gather additional evidence about the relationship between fall height and injury severity, what other data sources might you consider? What would be their advantages and limitations?

Solution

Part (a): Why observational?

It would be unethical and illegal to deliberately drop cats from various heights. The researchers had to rely on cats who fell accidentally (through their own decisions to venture onto ledges, windowsills, etc.).

Part (b): Alternative explanations

  • Survivorship bias: Cats that fall from very high heights and die may never be brought to the hospital — their owners may assume they’re dead or not find them. The hospital data only captures survivors.

  • Reporting bias: Owners of cats that fall from low heights but are uninjured may not bring them in, while high-fall survivors are more likely to be brought for evaluation.

  • Surface differences: Higher buildings may have different landing surfaces (grass, awnings, trees) than lower buildings.

  • Cat behavior: Cats that fall from windows they frequent (often lower floors) may be less prepared than cats that fall from unusual heights.

Part (c): Key limitation — selection bias

The sample only includes cats whose owners brought them to this particular hospital. This excludes: - Cats that died before reaching the hospital - Cats treated at other facilities or by private vets - Cats whose owners couldn’t afford veterinary care - Cats whose falls were never discovered

Part (d): Additional data sources

  • Veterinary records from multiple hospitals: Broader geographic coverage, larger sample. Limitation: Still subject to survivorship bias.

  • Animal control records: May capture cats found deceased. Limitation: Fall height often unknown.

  • Building management incident reports: Systematic documentation of falls. Limitation: Rare; not all buildings track this.

  • Insurance claims data: Objective records. Limitation: Not all cats are insured; may not include height information.


Exercise 6: Study Type Identification

Classify each study as observational or experimental, and identify a potential confounding factor that could affect the conclusions.

  1. Researchers compare the GPA of students who use the campus tutoring center versus those who don’t.

  2. A pharmaceutical company randomly assigns patients to receive either a new antidepressant or a placebo, then measures symptom improvement after 8 weeks.

  3. A tech company compares bug rates between teams that use agile methodology versus waterfall methodology.

  4. Agronomists plant corn seeds in 50 plots, randomly assigning each plot to receive either a new fertilizer or no fertilizer, then measure yields.

  5. Epidemiologists track a cohort of 10,000 adults over 20 years, recording their exercise habits and eventual health outcomes.

Solution

Part (a): Observational

Students self-select into using or not using the tutoring center.

Confounding factor: Motivation. Students who seek tutoring may be more motivated or conscientious, which independently affects GPA.

Part (b): Experimental

Patients are randomly assigned to treatment groups by the researchers.

Potential issue: While randomization controls for confounding at baseline, differential dropout rates could threaten validity if one treatment has more side effects. This attrition can reintroduce imbalance post-randomization.

Part (c): Observational

Teams choose (or are assigned by management) their methodology; researchers don’t randomly assign.

Confounding factor: Project complexity. Teams working on more complex projects may adopt agile (or waterfall) more often, and complexity independently affects bug rates.

Part (d): Experimental

Plots are randomly assigned to treatment conditions by the researchers.

Confounding controlled: Randomization should balance soil quality, sunlight, and other plot characteristics across groups.

Part (e): Observational

Researchers observe natural exercise behaviors; they don’t assign exercise regimens.

Confounding factor: Overall health consciousness. People who exercise may also eat better, avoid smoking, and seek preventive healthcare — all of which independently affect health outcomes.


8.1.7. Additional Practice Problems

True/False Questions

  1. A question about the average starting salary of computer science graduates is a deterministic question because salaries are fixed numbers.

  2. Measurement error can be completely eliminated by using high-quality instruments.

  3. Anecdotal evidence is useful for generating hypotheses but should not be used as the primary basis for important decisions.

  4. In an observational study, researchers can establish cause-and-effect relationships by controlling for all confounding variables.

  5. Available data is always inferior to newly collected data for research purposes.

  6. An experimental study requires that researchers actively manipulate at least one variable.

Multiple Choice Questions

  1. Which of the following is a statistical question?

    Ⓐ What is the boiling point of water at sea level?

    Ⓑ How many credits are required to graduate with a BS in Engineering?

    Ⓒ Do students who sit in the front of the classroom earn higher grades?

    Ⓓ What is the sum of the first 100 positive integers?

  2. A researcher studies whether coffee consumption is associated with heart disease by surveying 5,000 adults about their coffee habits and medical history. This is:

    Ⓐ An experimental study because coffee consumption is the treatment

    Ⓑ An observational study because participants choose their own coffee consumption

    Ⓒ An experimental study because the researcher is collecting new data

    Ⓓ Neither; this is anecdotal data

  3. Which data source provides the strongest evidence for establishing causation?

    Ⓐ A large database of historical records

    Ⓑ Expert opinions and professional experience

    Ⓒ A randomized controlled experiment

    Ⓓ A carefully designed observational study

  4. The main reason observational studies cannot establish causation is:

    Ⓐ They typically have smaller sample sizes

    Ⓑ Unmeasured confounding variables may explain observed associations

    Ⓒ The data quality is usually poor

    Ⓓ Researchers are unable to measure the variables accurately

Answers to Practice Problems

True/False Answers:

  1. False — Salaries vary among graduates due to many factors (company, location, experience, negotiation). The average must be estimated from data with inherent variability.

  2. False — All measurements have some degree of error. High-quality instruments reduce but cannot eliminate measurement uncertainty.

  3. True — Anecdotal evidence can suggest patterns worth investigating but lacks the systematic rigor needed for reliable conclusions.

  4. False — Even with statistical controls, observational studies cannot account for unmeasured confounders. Only randomization in experiments can balance all confounders (known and unknown). Causal claims from observational data remain fragile because of potential unmeasured confounding and model dependence.

  5. False — Available data can be superior when it’s well-documented, covers larger populations, or spans longer time periods than feasible for new collection.

  6. True — The defining feature of an experiment is that researchers actively manipulate (control) at least one explanatory variable.

Multiple Choice Answers:

  1. — This question involves variability (different students, different outcomes) and requires data to answer. The others have fixed, deterministic answers.

  2. — The researcher observes naturally occurring coffee habits rather than assigning participants to drink specific amounts. This is observational.

  3. — Randomized experiments can establish causation because random assignment balances confounders across treatment groups.

  4. — The fundamental limitation is that observed associations may be due to confounding variables that differ systematically between comparison groups.