Slides 📊

8.6. Sampling Design

How we select experimental units for our studies fundamentally determines whether our conclusions can be generalized beyond the specific individuals we observe. The transition from experimental design to sampling design marks a shift from internal validity to external validity. While experimental design principles ensure that our comparisons are fair and unbiased, sampling design principles ensure that our participants represent the population we want to understand.

Road Map 🧭

Understand that certain assumptions for valid statistical inference can only be satisfied through careful sampling design.
Recognize the serious limitaions of non-random sampling methods, which often introduce bias in the sample.
Learn the definitions of two major random sampling methods—Simple Random Sampling and Stratified Random Sampling—and understand the advantages and limitations of each.

8.6.1. Why Does the Sampling Design Matter?

The Connection to IID Assumptions

The mathematical tools we use for drawing conclusions all depend on specific assumptions about how our data are collected. When these assumptions are violated, even the most sophisticated analysis can produce misleading results.

In particular, most statistical inference procedures covered in this course rely on the Central Limit Theorem, which assumes that the observations are independent and identically distributed (iid). This condition can only be ensured during the sampling stage.

The Population-Sample-Population Cycle

Statistical inference follows a logical cycle that depends entirely on proper sampling design:

Diagram illustrating two samples with different degrees of generalizability — Fig. 8.11 Simple illustration of generalizability

Define the target population we want to understand
Draw a representative sample from that population using appropriate methods
Analyze the sample data using statistical procedures
Generalize results back to the target population with known levels of uncertainty

Each step depends on the previous ones. If our sample is not representative of the target population (step 2), then our analysis (step 3) and conclusions (step 4) will be invalid, no matter how sophisticated the statistical methods are.

8.6.2. Non-Random Sampling Methods: Understanding the Limitations

Convenience Sampling

Convenience sampling selects participants based solely on ease of access and availability. This approach is attractive because it is simple, fast, and inexpensive to implement, but it almost certainly introduces a bias.

Example 💡: Convenience Sampling

Academic Research: A psychology professor studies decision-making by recruiting students from her own classes. While convenient, this sample only represents college students in that particular major at that specific institution.

Medical Research: A doctor studies the effectiveness of a new treatment by enrolling patients who visit his clinic. This sample may systematically exclude people who can’t afford medical care, live far from the clinic, or prefer different healthcare providers.

Market Research: A company surveys customers who visit their website or respond to email invitations. This approach misses potential customers who don’t engage with the company online.

Political Polling: News outlets conduct “person on the street” interviews in busy downtown areas. Such samples systematically overrepresent people who work downtown, have flexible schedules, and are comfortable talking to reporters.

Voluntary Response Sampling

Voluntary response sampling occurs when individuals self-select into the study based on their own willingness or motivation to participate.

Example 💡: Voluntary Response

Political Issues: When news programs ask viewers to call in with their opinions on political topics, respondents typically have much stronger views than the general population. The results often show more extreme positions than scientific polls of the same topics.

Product Reviews: Online product reviews suffer from voluntary response bias because people with very positive or very negative experiences are much more likely to write reviews than those with neutral experiences.

Comment Sections: Online comment sections on news articles or social media posts systematically overrepresent people with strong opinions and those comfortable expressing views in public forums.

Limitations of Non-random Sampling

As seen in each example, non-random sampling methods are highly likely to produce bias in the observed sample. While these techniques can be useful for generating hypotheses or conducting preliminary studies, researchers should exercise caution when interpreting the validity of analyses based on data collected through non-random methods.

8.6.3. Random Sampling Methods: The Foundation of Valid Inference

Randomization in sampling serves the same fundamental purpose as randomization in experimental design: it removes systematic bias and replaces it with known, manageable random variation. When we can’t control all the factors that might affect who ends up in our sample, randomization ensures that these factors balance out across many possible samples.

Key Properties of Random Sampling

Known Selection Probabilities: For every member of the population, we can calculate the probability that they’ll be included in our sample. This probabilistic foundation enables statistical inference.
Unbiased Selection: The sampling process doesn’t systematically favor any particular type of person or outcome. Any biases that remain are due to random chance rather than systematic factors.
Quantifiable Uncertainty: Because we understand the probabilistic mechanism that generated our sample, we can calculate the uncertainty associated with our estimates and test results.
Reproducible Methods: Random sampling procedures can be described precisely and replicated by other researchers, enabling scientific verification of results.
Independence: When properly implemented, random sampling ensures that the selection of one unit has no (or minimal) influence on the probability of selecting any other unit.

A. Simple Random Sampling: The Gold Standard

Diagram of simple random sampling — Fig. 8.12 Simple Random Sampling

In Simple Random Sampling (SRS), every possible unit in the population has exactly the same probability of being selected, and every possible sample of a given size has exactly the same probability of being chosen.

Implementation Procedure

Suppose a simple random sample of size \(n\) is to be taken from a population of size \(N\).

Step 1: Define the target population.
Step 2: Give each unit in the population a unique identifier. This can be done through:
- Sequential numbering (1, 2, 3, …, N)
- Existing ID numbers (SSN, student ID, account numbers)
- Systematic codes that preserve anonymity while maintaining uniqueness
Step 3: Use a random process to select which identifiers will be included. In this step,
- Every individual in the population should have probability \(\frac{n}{N}\) of being included in the sample.
- Equivalently, every possible sample of size \(n\) should have probability \(1/\binom{N}{n}\) of being selected.

B. Stratified Random Sampling: Balancing Representation and Efficiency

Diagram of stratified random sampling — Fig. 8.13 Stratified Random Sampling

When a population contains important subgroups that differ substantially from each other, stratified random sampling provides a method for ensuring adequate representation of all subgroups. It divides the population into strata (subgroups) based on characteristics known before sampling, then gathers separate simple random samples within each stratum.

The sizes of the sub-samples may be uniform or vary across strata. When they differ, the allocation is often proportional to the relative sizes of the strata in the population.

Comparison of the Two Random Sampling Methods

Method

Advantages

Limitations

Simple Random Sampling

It is simple in construction.

It can miss a rare but significant subgroup from the population when the sample size is not large enough.

Stratified Random Sampling

It can ensure representation of important subgroups in the population.

It requires more planning than simple random sampling.

Which subgroups will be used as strata?
How will the sample size be allocated to each stratum?

8.6.4. Bringing It All Together

Key Takeaways 📝

The sampling methods fundamentally determine whether statistical inference procedures are valid and whether results can be generalized.
Non-random sampling methods are often simple and cheap but introduce biases, limiting their use to preliminary investigations.
Random sampling methods provide the probabilistic foundation required for statistical inference procedures.
Simple random sampling assigns equal selection probability to every member of a population. It satisfies the iid assumption required by most statistical procedures.
When populations contain distinct subgroups, stratified random sampling can be used to control the balance among subgroups.

Exercises

Sampling Method Classification: For each scenario below, identify the sampling method being used and explain its major strengths and limitations.
1. A political pollster calls every 50th person in the phone directory.
2. A news website asks visitors to vote in an online poll about a current issue.
3. A health researcher obtains a list of all registered patients at area hospitals and randomly selects 500 for a nutrition study.
4. A marketing company recruits participants for a focus group by approaching shoppers at a mall entrance.
Stratified Sampling Design: A state education department wants to study teacher job satisfaction using a sample of 600 teachers. The state has 120 elementary schools, 80 middle schools, and 60 high schools.
1. Explain why stratified sampling might be preferable to simple random sampling for this study.
2. Calculate sample sizes for each stratum using proportional allocation.
3. Calculate sample sizes using uniform allocation.
4. Discuss the advantages and disadvantages of each allocation method for this study.
Bias in Non-Random Sampling: A researcher wants to study exercise habits among adults and recruits participants by posting flyers at gyms, health food stores, and medical clinics.
1. What type of sampling method is this?
2. Identify at least three specific ways this sampling method might create bias.
3. Explain why the bias cannot be corrected through statistical analysis.
4. Suggest an alternative sampling approach that would reduce these biases.
Population Definition Challenge: You want to study “social media usage among teenagers” but need to define your target population precisely.
1. What specific decisions must you make about age boundaries, geographic scope, and inclusion criteria?
2. How might different population definitions affect your sampling approach?
3. What sampling frame could you realistically use for this population?
4. What groups might be systematically excluded from common sampling frames?
Voluntary Response Analysis: A local newspaper publishes an online survey asking readers about their support for a proposed tax increase. The survey receives 2,847 responses, with 73% opposing the tax.
1. What type of sampling method is this?
2. Why might these results not represent the views of all local residents?
3. What specific types of people might be overrepresented in this sample?
4. How might the results differ if the same question were asked in a scientific poll using random sampling?