7.1. Statistics and Sampling Distributions
We’ve spent considerable time building probability models that describe populations. Now we reach a pivotal moment in our statistical journey: the bridge from probability theory to statistical inference. Instead of knowing the population distribution, we’ll work backward from sample data to make conclusions about unknown populations. This transition requires understanding how sample statistics themselves behave as random variables.
Road Map 🧭
Understand how sample statistics can be viewed as random variables.
Master the theoretical properties of sample statistics in relation to the population distribution.
7.1.1. Parameters, Statistics, and the Bridge to Inference
Before diving into sampling distributions, let us review the fundamental vocabulary.
Population Parameters
A parameter is a number that describes some attribute of the population. For example, the population mean \(\mu\) is a parameter that tells us the average value across all units in the population. These parameters are often unknown in practice, but they always exist as fixed, non-random values.
When we study probability distributions, we often assume we know these parameters to understand theoretical behavior. But in statistical inference, the tables turn—we observe sample data and try to learn about these unknown population characteristics.
Sample Statistics: Our Window Into the Population
A statistic \(T\) is a function that maps each sample to a numerical summary \(t\):
One example is the sample mean \(\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i\), which summarizes the center of an observed sample. Unlike parameters, we can calculate statistics directly from the data.
The crucial insight is that statistics will change from sample to sample. If we collect a new sample of the same size from the same population, we’ll almost certainly get different values for \(\bar{x}\).
Estimators: Statistics with a Purpose
An estimator is a statistic which targets a specific population parameter. We use the sample mean \(\bar{x}\) as an estimator of the population mean \(\mu\). We use the sample standard deviation \(s\) as an estimator of the population standard deviation \(\sigma\).
The sample mean \(\bar{x}\) is both a statistic (it summarizes our sample) and an estimator (it targets the population mean \(\mu\)). This dual role reflects the bridge between describing what we observe and inferring what we cannot directly measure.

Fig. 7.1 Parameters describe the population; statistics summarize the sample; estimators target the unknown parameters
7.1.2. The Sampling Distribution
To understand how estimators behave across many possible samples, we must establish the concept of a sampling distribution.
The Thought Experiment
Imagine we could repeatedly sample from the same population, computing a statistic each time. Suppose we’re studying the heights of college students and we take samples of size \(n = 25\). We collect our first sample, measure all \(25\) students, and compute \(\bar{x}_1 = 67.2\) inches. We collect a second sample of \(25\) different students and get \(\bar{x}_2 = 68.8\) inches. A third sample gives \(\bar{x}_3 = 66.9\) inches.
If we repeated this process thousands of times, we’d have thousands of different sample means: \(\bar{x}_1, \bar{x}_2, \cdots, \bar{x}_m\). These sample means would vary, and that variation would follow a pattern. The distribution of these sample means is called the sampling distribution of the sample mean.
Formal Definition
The sampling distribution of a statistic is the probability distribution of that statistic across all possible samples of the same size from the same population. It tells us how the statistic behaves as a random variable.
This concept applies to any statistic—sample means, sample standard deviations, sample medians, sample correlations. Each has its own sampling distribution that describes how that particular statistic varies from sample to sample.
Why This Matters for Inference
Understanding sampling distributions allows us to quantify the uncertainty in our estimates. If we know how \(\bar{X}\) behaves across many samples, we can assess how reliable any single observed \(\bar{x}\) might be as an estimate of \(\mu\). This knowledge forms the foundation for confidence intervals, hypothesis tests, and all other inferential procedures.
The Capital \(\bar{X}\)
We will now study the behavior of sample statistics as random variables. To distinguish from contexts where they serve as realized values, we use capital letters. \(\bar{X}\) denotes the random variable that generates \(\bar{x}\)’s. Similar distinctions apply to \(S\) vs. \(s\) and \(S^2\) vs. \(s^2\).
7.1.3. Factors Affecting Sampling Distributions
The population distribution
Sample size, \(n\)
The statistic itself
For example, \(S\) can only take positive values, while \(\bar{X}\) has no such restriction. This is due to differences in how the two statistics are computed.
The sampling technique
The sampling technique affects how well a sample represents the population and whether key properties like independence are satisfied.
We will examine how these factors influence inference in greater depth in the upcoming chapters.
7.1.4. Bringing It All Together
Key Takeaways 📝
Statistics are random variables that vary from sample to sample, creating sampling distributions that describe this variability. Understanding how statistics behave as random variables lets us quantify uncertainty and make conclusions about unknown populations.
Multiple factors affect sampling distributions: population shape, sample size, the choice of statistic, and independence all play important roles.
Exercises
Conceptual Understanding: Explain the difference between a parameter, a statistic, and an estimator. Give an example of each in the context of measuring the average commute time for workers in your city.