8.4. Addressing Potential Flaws in Experimental Design

Understanding the three fundamental principles of experimental design provides the foundation for conducting rigorous research, but implementing these principles perfectly in real-world settings is often impossible. The difference between a good experiment and a great one often lies not in eliminating all potential problems but in recognizing where problems might arise and taking steps to minimize their impact.

Road Map 🧭

  • While the three principles of experimental design provide guidelines for ideal experimental settings, understand that it is usually impossible to uphold them perfectly.

  • Learn different types of flaws that can arise in an experiment and how to address each one.

8.4.1. The Nature of Bias in Experimental Design

Bias in experimental design refers to errors that cause the results to deviate from the truth in a consistent direction. Unlike random error, which varies unpredictably around the true value, bias pushes our results away from reality in a systematic way. This distinction is crucial because while we can reduce random error through replication, we cannot eliminate bias simply by collecting more data.

Bias is dangerous because it often goes undetected. Random variation is visible in our data—we can see that individual observations vary around some central tendency. But systematic bias can masquerade as real effects, leading us to conclude that treatments work when they don’t, or that they don’t work when they actually do.

Let us examine three types of bias that can occur during an experiment: selection bias, measurement bias, and confounding bias.

A. Selection Bias: When Groups Are Not Comparable

Selection bias occurs when experimental units are assigned to treatment groups in ways that create fundamental differences between groups that are not due to the treatments themselves.

Example💡: Medical Study with Flawed Assignment

Imagine a clinical study where researchers unconsciously assign sicker patients to the treatment group, hoping to help them, while healthier patients end up in the control group. Even if the treatment has no effect, the treatment group might show more improvement simply because sicker patients have more room for improvement. Conversely, a truly effective medication might appear ineffective if the treatment group starts out much sicker than the control group.

B. Measurement Bias: When Observations Are Systematically Distorted

Measurement bias refers to systematic errors in how we collect, record, or process our data.

Example 💡: The Interns

Consider a medical experiment studying the effects of different treatments on blood chemistry. The study design calls for blood samples to be drawn from participants at regular intervals and analyzed in the laboratory. To ensure consistent procedures, the research team assigns specific personnel to work with specific treatment groups:

  • Intern A always draws blood from Group 1 participants

  • Intern B works with Group 2

  • Intern C handles Group 3

suppose Intern A is new and occasionally contaminates blood samples through improper technique. This contamination might systematically alter the laboratory results for Group 1, making their blood chemistry appear different from the other groups.

C. Confounding Bias: The Problem of Unmeasured Influences

Confounding bias occurs when an extraneous variable is related to both the factors and the response variable, but we fail to control for or block this variable.

A confounding variable has three key characteristics:

  1. It influences the response variable.

  2. It is associated with treatment assignment or treatment groups.

  3. The variation it causes is not addressed by the design, either through randomization or blocking.

Example 💡: Exercise and Heart Health

Suppose we want to study whether a new exercise program reduces heart disease risk. We recruit volunteers and randomly assign them to either participate in the exercise program or continue their normal routine. After six months, we find that the exercise group has better cardiovascular health markers.

However, imagine that we failed to account for dietary habits. If people who volunteer for exercise programs also tend to eat healthier diets, then diet becomes a confounding variable. The improved cardiovascular health might be due to:

  • The exercise program (what we want to conclude)

  • The healthier diets (confounding)

  • Both exercise and diet together

  • Neither (other unmeasured factors)

Without controlling for diet, we cannot determine which explanation is correct.

How to Minimize Bias

The primary defense against bias is rigorous randomization with proper concealment of assignment sequences. Additionally, baseline characteristics should be carefully monitored to verify that randomization has achieved its intended goal. Input from domain experts is essential for identifying procedural flaws and recognizing well-known confounding variables.

8.4.2. Lack of Realism: The Challenge of External Validity

Lack of realism represents a fundamental tension in experimental design between internal validity (our ability to draw valid causal conclusions within our study) and external validity (our ability to generalize those conclusions to real-world settings). This issue arises when our experimental units, treatments, or study settings fail to adequately represent the conditions we ultimately want to understand.

To understand how lack of realism can compromise experimental conclusions, consider a classic example from psychological research.

Example 💡: The Workplace Layoff Study

Suppose researchers want to study how layoffs at a workplace affect the morale of workers who remain on the job—a question with obvious practical importance for understanding organizational behavior and employee well-being.

The Ethical Constraint

The most direct approach would be to conduct a true experiment: approach various employers and ask them to randomly lay off some employees so researchers can observe the effects on remaining workers.

However, this approach is completely unethical. Deliberately causing people to lose their jobs for research purposes would cause real harm to participants and their families. No institutional review board would approve such a study, and no ethical researcher would propose it.

The Compromised Solution

Faced with this ethical constraint, researchers might design an alternative study using college students as experimental units. The study design might work as follows:

  1. Recruit college students to participate in a temporary job proofreading textbooks.

  2. Create a realistic work environment with multiple students working together.

  3. Randomly assign some students to be “laid off” during the study (with their knowledge and consent).

  4. Monitor the remaining students and measure their morale and productivity.

  5. Administer psychological surveys to assess the impact of witnessing their colleagues being dismissed.

This design maintains the three fundamental principles—it includes control groups (students who don’t witness layoffs), uses randomization (to determine who gets “laid off”), and incorporates replication (multiple students in each condition).

Why This Study Lacks Realism

Despite adhering to sound experimental principles, this study suffers from serious limitations in realism, compromising its external validity. The job in this experimental setting carries different stakes and social dynamics than those in a real workplace.

Strategies for Maximizing Realism

While perfect realism is often impossible, researchers can take steps to maximize external validity. For example, they can

  • Allocate sufficient resources to collect a sample representative of the target population and to create realistic experimental settings.

  • Conduct multiple studies addressing different aspects of potential realism limitations.

  • Interpret the results carefully to clearly acknowledge the limitations in generalizing the findings.

8.4.3. The Significance of Generalization

Our efforts to minimize the flaws mentioned above ultimately lead to one key goal: making the experiment’s results generalizable. Generalization refers to the ability to apply research findings to broader populations, environments, or contexts that were not directly studied. Although many details were discussed, remember that good experimental practice boils down to this set of simple rules:

  • Control What You Can, Block What You Can’t Control

  • Randomize to Create Comparable Groups

  • Ensure Sufficient Replication

8.4.4. Bringing it All Together

Key Takeaways 📝

  1. Bias systematically distorts results in consistent directions, making it more dangerous than random error because it cannot be reduced through larger sample sizes. There are three major types of experimental bias: selection bias, meaurement bias, and confounding bias.

  2. Lack of realism represents the trade-off between internal validity (experimental control) and external validity (generalizability to real-world conditions).

  3. Perfect studies are impossible. The goal is not to eliminate all potential problems but to minimize them systematically while maintaining feasibility.

  4. The simple rule provides practical guidance: Control what you can, block what you can’t control, randomize to create comparable groups, and ensure sufficient replication.

8.4.5. Exercises

These exercises develop your understanding of experimental design flaws, types of bias, and the trade-off between internal and external validity.

Key Concepts

Types of Bias in Experiments

  • Selection bias: Groups differ systematically before treatment begins (due to flawed assignment)

  • Measurement bias: Systematic errors in data collection, recording, or processing

  • Confounding bias: An uncontrolled variable affects both treatment assignment and the response

Confounding Variable Characteristics

A confounding variable:

  1. Influences the response variable

  2. Is associated with treatment assignment/groups

  3. Is NOT addressed by the design (not controlled or blocked)

Internal vs. External Validity

  • Internal validity: Ability to draw valid causal conclusions within the study

  • External validity: Ability to generalize conclusions to real-world settings

  • Trade-off: Tight experimental control may sacrifice realism

Minimizing Bias

  • Rigorous randomization with concealed assignment

  • Standardized procedures across all groups

  • Blinding (when feasible)

  • Domain expert input to identify potential confounders

  • Careful baseline monitoring


Exercise 1: Identifying Bias Types

For each scenario, identify the primary type of bias (selection, measurement, or confounding) and explain how it could affect the study conclusions.

  1. A study on exercise and weight loss allows participants to choose whether to join the exercise group or the control group.

  2. In a drug trial, the nurse recording patient symptoms knows which patients received the experimental drug and which received the placebo.

  3. A company tests employee productivity under two management styles, but managers using Style A happen to oversee departments with newer, more modern equipment.

  4. A clinical trial recruits patients from only one hospital, which happens to specialize in severe cases.

  5. Researchers studying the effect of a tutoring program compare final grades of students who attended tutoring to those who didn’t, without accounting for the fact that struggling students were more likely to seek tutoring.

Solution

Part (a): Selection bias

Explanation: Self-selection means people who choose to exercise likely differ systematically from those who don’t — they may be more motivated, health-conscious, or have more flexible schedules. These pre-existing differences (not the exercise program) could explain weight loss differences.

Effect: May overestimate or underestimate the true effect of exercise if self-selectors differ in important ways.

Part (b): Measurement bias

Explanation: Knowing treatment status can unconsciously influence how the nurse interprets and records symptoms. She might rate drug recipients more favorably (if she expects the drug to work) or more critically (if looking for side effects).

Effect: Systematic distortion of outcome data that favors one group over another.

Part (c): Confounding bias

Explanation: Equipment quality is a confounding variable — it’s associated with management style (Style A has newer equipment) and independently affects productivity. Any productivity difference could be due to management style, equipment, or both.

Effect: Cannot determine whether observed differences are due to management approach or equipment differences.

Part (d): Selection bias / External validity issue

Explanation: Patients at a specialty hospital for severe cases are not representative of the broader patient population. This is primarily a generalizability (external validity) concern — results may not apply to typical patients with less severe conditions.

Effect: Conclusions may be valid for severe-case patients (internal validity preserved) but cannot be generalized to the broader population. Treatment effects in severe cases may differ substantially from effects in typical cases.

Part (e): Confounding bias

Explanation: Academic preparation/ability is a confounding variable — students who seek tutoring may be struggling (lower baseline ability), while non-attendees may already be performing well. Ability affects both tutoring attendance and final grades.

Effect: May underestimate tutoring effectiveness (comparing weaker students to stronger ones) or create misleading comparisons.


Exercise 2: Confounding Variables

For each observed association, identify two potential confounding variables and explain how each could create the observed relationship without any true causal effect.

  1. Students who eat breakfast have higher GPAs than students who skip breakfast.

  2. Cities with more police officers have higher crime rates.

  3. Children who watch more TV have lower reading scores.

  4. Employees who participate in wellness programs take fewer sick days.

Solution

Part (a): Breakfast and GPA

Confounders:

  1. Socioeconomic status: Students from higher-income families may be more likely to have breakfast (food availability, time, family routines) AND have higher GPAs (educational resources, less stress, better schools). SES affects both breakfast habits and academic performance.

  2. General conscientiousness/self-discipline: Students who are organized and disciplined enough to eat breakfast regularly may also study more consistently, attend class, and complete assignments. The same underlying trait drives both behaviors.

Part (b): Police and crime rates

Confounders:

  1. Population density/urbanization: Dense urban areas have more police officers per capita (more need, more tax revenue) AND higher crime rates (more opportunity, more reporting). City size drives both.

  2. Poverty rates: Areas with higher poverty may have both more crime AND more police assigned as a response. The police presence is a consequence of crime, not a cause.

Part (c): TV watching and reading scores

Confounders:

  1. Parental involvement: Parents who limit TV may also read to children, help with homework, and encourage learning — all of which improve reading scores. Parenting style affects both.

  2. Socioeconomic status: Lower-income households may have fewer books/educational resources (lowering reading scores) and more TV as a primary entertainment source. SES drives both.

Part (d): Wellness programs and sick days

Confounders:

  1. Baseline health: Healthier employees are more likely to participate in wellness programs (interested in fitness, have energy) AND take fewer sick days naturally. Initial health status affects both.

  2. Job satisfaction/engagement: Engaged employees may participate in company programs AND be more motivated to come to work even when slightly unwell. Engagement drives both behaviors.


Exercise 3: The Lack of Realism Problem

Researchers want to study how time pressure affects decision-making quality in emergency room physicians.

  1. Explain why conducting a true experiment in an actual ER would be ethically problematic.

  2. The researchers instead recruit medical students to participate in a simulated ER scenario with varying levels of time pressure. Identify three ways this study lacks realism compared to actual ER practice.

  3. For each limitation identified in (b), discuss whether it would likely cause the study to overestimate or underestimate the effect of time pressure, or whether the direction is unclear.

  4. Suggest one modification that would increase realism while maintaining ethical standards.

Solution

Part (a): Why true experiment is unethical

A true experiment would require deliberately manipulating time pressure during real patient care — potentially rushing physicians treating actual emergencies or artificially delaying care. This could: - Cause patient harm or death - Violate informed consent (patients don’t consent to experimental conditions) - Undermine standard of care requirements - Create legal liability for malpractice

Part (b): Three limitations affecting realism

  1. Participants (medical students vs. ER physicians): Students lack the experience, skill, and decision-making heuristics that come from years of practice. Their baseline decision quality and response to pressure may differ fundamentally from experienced physicians.

  2. Stakes/consequences: In the simulation, no one actually suffers if a poor decision is made. Real ER decisions involve life-or-death stakes that affect stress, motivation, and cognitive processing in ways a simulation cannot replicate.

  3. Environment complexity: Real ERs involve multiple simultaneous patients, interruptions, team dynamics, family members, equipment malfunctions, and systemic pressures. Simulations are typically simplified, removing much of the real cognitive load.

Part (c): Direction of bias

  1. Medical students: Direction unclear. Students may make worse decisions overall (lacking experience), but may also respond differently to pressure — they might panic more (overestimate effect) or take pressure less seriously because stakes feel lower (underestimate effect).

  2. Lower stakes: Likely underestimate. Without real consequences, participants may not experience the physiological stress response (adrenaline, cortisol) that affects cognition under genuine time pressure.

  3. Simplified environment: Could either direction. Simplified conditions might make time pressure effects clearer (overestimate) by removing other variables, OR might underestimate effects because real-world complexity amplifies time pressure through competing demands.

Part (d): Modification to increase realism

Use experienced ER physicians in high-fidelity simulation: Recruit actual ER physicians to participate in scenarios using professional simulation centers with realistic mannequins, monitoring equipment, and confederate actors playing nurses and family members. While still simulated, this captures expert decision-making processes and provides more realistic environmental complexity.

Trade-off: Higher cost, more difficult recruitment, smaller sample size.


Exercise 4: Internal vs. External Validity Trade-offs

A tech company wants to study whether remote work improves programmer productivity.

Study A: Recruit 100 volunteer programmers, randomly assign 50 to work remotely and 50 to work in the office for 3 months. Provide identical equipment, standardized tasks, and the same supervision structure. Measure lines of code, bugs, and project completion.

Study B: Survey 5,000 programmers across various companies about their work arrangement (remote, hybrid, in-office) and have their managers rate their productivity.

  1. Which study has stronger internal validity? Explain.

  2. Which study has stronger external validity? Explain.

  3. Study A finds that remote workers produce 15% more code. Study B finds no significant difference. How might you explain this discrepancy?

  4. If you could only conduct one study, which would you choose and why?

Solution

Part (a): Internal validity

Study A has stronger internal validity.

  • Random assignment eliminates selection bias — remote and office groups are comparable at baseline

  • Standardized conditions (equipment, tasks, supervision) control for confounders

  • Direct measurement of productivity rather than subjective ratings

  • The experimental setup allows causal inference: differences can be attributed to work arrangement

Study B cannot establish causation because: - Self-selection into work arrangements creates systematic group differences - Companies differ in culture, resources, and management - Manager ratings may be biased (managers may perceive remote workers differently) - Many unmeasured confounders (job type, seniority, personal circumstances)

Part (b): External validity

Study B has stronger external validity.

  • Large, diverse sample across many companies captures real-world variation

  • Observes programmers in their natural work settings, not artificial conditions

  • Includes the complexity of actual jobs (not standardized tasks)

  • Reflects realistic manager assessments (even if biased, this is how productivity is actually evaluated)

Study A limitations: - Only 100 volunteers from presumably one company - Volunteer effect (those willing to be randomized may differ from typical programmers) - Artificial standardization doesn’t reflect real job complexity - 3 months may not capture long-term adaptation

Part (c): Explaining the discrepancy

Possible explanations:

  1. Selection effects in Study B: In the real world, those who choose remote work (or whose companies allow it) may differ systematically. Perhaps less productive workers gravitate to remote work, masking any productivity benefit.

  2. Standardization effect in Study A: The controlled tasks may favor remote work in ways that don’t apply to complex real-world projects requiring collaboration.

  3. Hawthorne effect in Study A: Participants knew they were being studied and may have worked harder (especially remote workers trying to prove remote work works).

  4. Manager bias in Study B: Managers may underrate remote workers’ productivity due to visibility bias (“if I don’t see them working, they must not be working”).

  5. Volunteer effect in Study A: Programmers willing to be randomly assigned may be those indifferent to location — not representative of strong preferences that exist in the real workforce.

Part (d): Which study to choose

Argument for Study A: If the goal is to establish whether remote work CAN improve productivity under good conditions, internal validity is crucial. Study A can demonstrate causation, which is more valuable for understanding the fundamental relationship.

Argument for Study B: If the goal is to inform policy decisions about work arrangements in real companies, external validity matters more. Study B reflects what actually happens when companies implement remote work policies.

Recommended choice depends on the goal:

  • Choose Study A if you need causal evidence under controlled conditions

  • Choose Study B if you need real-world policy relevance

If forced to choose one, many researchers would choose Study A because establishing causation is fundamental — but this comes with the strong caveat that findings may not transfer directly to real-world settings. Ideally, conduct both types of studies.


Exercise 5: Designing to Minimize Bias

You are designing a study to test whether a new project management software improves team efficiency compared to the current system.

For each type of bias, (a) explain how it could manifest in this study, and (b) describe specific design features to minimize it.

  1. Selection bias

  2. Measurement bias

  3. Confounding bias

Solution

1. Selection Bias

(a) How it could manifest:

  • Teams that volunteer for the new software may be more tech-savvy, more frustrated with the current system, or led by more innovative managers

  • Early-adopter teams may be systematically different from typical teams

  • Management might assign the new software to already high-performing teams to ensure “success”

(b) Design features to minimize:

  • Random assignment: Randomly assign teams to new or current software

  • Concealed allocation: Don’t reveal assignments until teams are enrolled

  • Baseline measurement: Measure team efficiency before assignment; verify groups are comparable

  • Inclusion criteria: Define eligible teams clearly before randomization

  • Intention-to-treat analysis: Analyze based on assigned group, not actual usage

2. Measurement Bias

(a) How it could manifest:

  • If managers know which software their team uses, they may rate performance differently

  • Self-reported efficiency measures may reflect enthusiasm rather than actual productivity

  • Teams using new software may receive more attention (Hawthorne effect), working harder due to observation

  • Different software may make certain metrics easier/harder to track, creating measurement asymmetry

(b) Design features to minimize:

  • Blinding evaluators: Have efficiency assessed by evaluators who don’t know which software teams use

  • Objective metrics: Use objective measures (project completion dates, budget variance, defect rates) rather than subjective ratings

  • Standardized measurement: Define metrics identically for both groups; ensure both systems can track the same outcomes

  • Third-party assessment: Have an independent party collect and analyze efficiency data

3. Confounding Bias

(a) How it could manifest:

  • Teams using new software might receive additional training, and training (not software) improves efficiency

  • Implementation might coincide with other organizational changes (new processes, hiring, reorganization)

  • Team size or project complexity might differ between groups and independently affect efficiency

  • Manager support/enthusiasm for the change affects both adoption and efficiency

(b) Design features to minimize:

  • Control for training: Either give equal training time to both groups (new software training vs. refresher on current software) or track training as a covariate

  • Blocking by team characteristics: Block by team size, project type, or department before random assignment

  • Standardize implementation support: Provide equal technical support to both groups

  • Measure known confounders: Track potential confounders and include them in analysis

  • Timing control: Implement changes simultaneously; avoid coincidence with other organizational changes


Exercise 6: Evaluating a Study Design

Read the following study description and identify potential flaws:

“A university tested whether a new advising system improves student retention. The College of Engineering adopted the new system while the College of Liberal Arts continued with traditional advising. After one year, Engineering had 92% retention compared to 85% retention in Liberal Arts. The university concluded the new advising system improved retention by 7 percentage points.”

  1. What type of bias is most evident in this design?

  2. Identify at least three confounding variables that could explain the retention difference.

  3. Why is the conclusion about the advising system’s effectiveness not justified?

  4. Propose a better study design to evaluate the advising system.

Solution

Part (a): Primary bias

Confounding bias (with selection bias elements).

The treatment (advising system) is completely confounded with college. Any difference between colleges — in students, faculty, culture, curriculum, job market, or anything else — is entangled with the advising system difference.

Part (b): Confounding variables

  1. Student characteristics: Engineering students may have higher academic preparation, be more focused on career goals, or face different financial pressures than Liberal Arts students. These baseline differences affect retention independently.

  2. Job market/career prospects: Engineering graduates may have clearer career paths and higher earning potential, motivating persistence. Liberal Arts students may face more uncertainty about career outcomes.

  3. Academic culture: Engineering programs often have more structured curricula with cohort-based progression, naturally creating community and support. Liberal Arts may have more flexibility but less built-in structure.

  4. Faculty-student ratios: Colleges may differ in advising loads, faculty availability, and mentorship culture.

  5. Historical retention rates: Engineering may have always had higher retention, even before the new system.

Part (c): Why conclusion is not justified

The conclusion assumes that the only difference between colleges is the advising system. But colleges differ in countless ways, and we cannot isolate the advising effect. The 7-point difference could be entirely due to pre-existing college differences, entirely due to the advising system, or some combination.

Without random assignment or proper controls, this is an observational comparison of two very different populations, not an experiment evaluating an intervention.

Part (d): Better study design

Option 1: Randomized experiment within a single college

  • Within one college, randomly assign incoming students to new or traditional advising

  • Compare retention rates after one year

  • Controls for all college-level confounders

Option 2: Phased rollout with multiple colleges

  • Randomly select colleges to adopt the new system (treatment) or continue with traditional (control)

  • Include multiple colleges in each condition

  • Analyze retention accounting for college as a blocking factor

Option 3: Matched comparison with baseline data

  • Compare Engineering’s retention before and after implementation

  • Compare to Liberal Arts’ change over the same period (difference-in-differences)

  • This controls for any changes affecting all students over time


8.4.6. Additional Practice Problems

True/False Questions

  1. Bias can be eliminated by increasing sample size.

  2. A confounding variable must be associated with both the treatment assignment and the response.

  3. Selection bias occurs when the outcome measurement is systematically flawed.

  4. A study with high internal validity automatically has high external validity.

  5. Double-blinding primarily addresses measurement bias.

  6. Lack of realism is a concern related to external validity.

Multiple Choice Questions

  1. A researcher finds that hospitals with more hand sanitizer dispensers have higher infection rates. Which explanation represents a confounding relationship?

    Ⓐ Hand sanitizer causes infections

    Ⓑ Infection rates are measured inaccurately at some hospitals

    Ⓒ Larger hospitals have both more dispensers and more serious cases

    Ⓓ Some hospitals don’t report all infections

  2. In a study comparing two teaching methods, students who know they’re in the “experimental” group may try harder. This is primarily a concern about:

    Ⓐ Selection bias

    Ⓑ Measurement bias

    Ⓒ Confounding bias

    Ⓓ External validity

  3. Which scenario represents SELECTION bias?

    Ⓐ A scale that consistently reads 2 pounds too heavy

    Ⓑ Assigning healthier patients to receive the experimental treatment

    Ⓒ Weather affecting crop yields differently in treatment plots

    Ⓓ Using college students to study workplace stress

  4. The “simple rule” for experimental design states: “Control what you can, block what you can’t control, _____, and ensure sufficient replication.”

    Ⓐ Measure everything possible

    Ⓑ Randomize to create comparable groups

    Ⓒ Use double-blinding when feasible

    Ⓓ Maximize external validity

  5. A study uses identical twins to test two diet plans, with each twin randomly assigned to one plan. This design controls for confounding by:

    Ⓐ Randomization alone

    Ⓑ Matching on genetic and environmental factors

    Ⓒ Blinding participants

    Ⓓ Increasing sample size

  6. A study conducted in a laboratory finds significant treatment effects, but the same treatment shows no effect in real-world settings. This suggests a problem with:

    Ⓐ Internal validity

    Ⓑ External validity

    Ⓒ Statistical power

    Ⓓ Measurement precision

Answers to Practice Problems

True/False Answers:

  1. False — Bias is systematic error that persists regardless of sample size. Larger samples only reduce random error (variability).

  2. True — By definition, a confounder must affect the response AND be associated with treatment groups.

  3. False — Selection bias occurs when treatment groups differ systematically at baseline. Flawed measurement is measurement bias.

  4. False — Internal and external validity often trade off. Tight experimental control (high internal validity) may create artificial conditions (low external validity).

  5. True — Double-blinding prevents both participant expectations and researcher expectations from systematically distorting outcome measurements.

  6. True — Lack of realism means findings may not generalize to real-world conditions, which is an external validity concern.

Multiple Choice Answers:

  1. — Hospital size is a confounder: larger hospitals have more dispensers AND treat more/sicker patients (higher infection rates).

  2. — This describes the Hawthorne effect, where knowledge of being studied changes behavior. It distorts the measured outcome because the act of being studied (not the treatment itself) affects performance. This is a form of measurement bias that threatens internal validity.

  3. — Selection bias involves systematic differences between groups at baseline, typically due to non-random assignment.

  4. — The complete rule: Control, Block, Randomize, Replicate.

  5. — Identical twins share genetics and often similar environments; matching them controls for these potential confounders.

  6. — The laboratory findings don’t generalize to real-world settings, indicating limited external validity.