7.2. Sampling Distribution for the Sample Mean

Having established that statistics are random variables with their own distributions, we now focus on the most important statistic in all of statistical inference: the sample mean \(\bar{X}\).

Road Map 🧭

  • View the sample mean \(\bar{X}\) as a random variable that varies from sample to sample.

  • Establish the distributional properties of \(\bar{X}\), such as \(E[\bar{X}]\) and \(\text{Var}(\bar{X})\), in relation to the population parameters \(\mu\) and \(\sigma\).

  • Define the standard deviation of the sample mean as the standard error and understand how it is influenced by the population standard deviation and sample size.

  • Visually observe how the population distribution and the sample size affects the overall shape of the sampling distribution for the sample mean.

7.2.1. A New Perspective on the Data-Generating Procedure

So far, we’ve pictured the sampling procedure from a population as drawing individual datapoints \(n\) different times from a single \(X\) (left of Fig. 7.2).

A new perspective of understanding how data points are sampled

Fig. 7.2 Left represents how we used to think of the sampling procedure; we now think in the perspective on the right

For the formal understanding of the sampling distribution of \(\bar{X}\), we need to begin with a new perspective. Imagine that there are \(n\) independent and identically distributed (iid) copies of the population, \(X_1, X_2, \cdots, X_n\), and a sample is constructed by taking one data point from each copy (right of Fig. 7.2).

Through this shift, we can now express the sample mean \(\bar{X}\) as a function of \(n\) random variables:

\[\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i\]

Why This Perspective Matters

The shift from viewing \(\bar{x}\) as merely “the mean of my data” to recognizing it as a single realization from a random variable \(\bar{X}\) is crucial for statistical inference. When we eventually compute a confidence interval or perform a hypothesis test, we’ll be asking questions like:

  • “If the population mean really is \(\mu_0\), how likely is it that we’d observe a sample mean as extreme as the one we got?”

  • “What range of population means would be consistent with the sample mean we observed?”

These questions only make sense if we understand how \(\bar{X}\) behaves probabilistically.

7.2.2. Visualizing Sampling Distributions

Let’s explore sampling distributions through a concrete example that reveals how sample size affects the behavior of our estimators.

The Population: Exponential Distribution

Consider a population that follows an exponential distribution with parameter \(\lambda = 1\). Recall that this distribution is highly right-skewed, with most values bunched near 0 and a long tail extending to the right. The population mean is \(\mu = 1/\lambda = 1\), and the population standard deviation is \(\sigma = 1/\lambda = 1\).

The pdf of exponential distribution

Fig. 7.3 The exponential population: highly right-skewed with mean \(\mu=1\)

When we conduct statistical inference in practice, we won’t know the population follows an exponential distribution or what its parameter value is. But for learning about sampling distributions, we’ll assume this knowledge so we can compare our sample results to the known truth.

Sampling with n = 5

Let’s start by taking samples of size \(n = 5\) from this exponential population. Using R, we can simulate this process:

# Take one sample of size 5
sample1 <- rexp(5, rate = 1)
sample_mean1 <- mean(sample1)
# Result: 0.39

Each sample gives us a different sample mean. To understand the sampling distribution, we repeat this process many times—say, one million times—and examine how these sample means are distributed.

# Simulate the sampling distribution
num_samples <- 1000000
n <- 5
sample_means <- replicate(num_samples, mean(rexp(n, rate = 1)))

When we plot the distribution of these million sample means, we see something remarkable. The distribution no longer looks like the original exponential distribution. It’s still somewhat right-skewed, but the degree of skewness has diminished. The sample means cluster more tightly around the true population mean \(\mu=1\).

Histogram of sample means when :math:`n=5`

The Effect of Increasing Sample Size

Now let’s see what happens when we increase the sample size to \(n = 25\):

Histogram of sample means when :math:`n=25`

The transformation is dramatic. The sampling distribution is now roughly symmetric and centered around \(\mu = 1\). It bears little resemblance to the original exponential population. The sample means are much more concentrated around the true value—most fall between 0.5 and 1.5.

With \(n = 65\), the pattern becomes even more pronounced:

Histogram of sample means when :math:`n=65`

Now the distribution is highly concentrated around \(\mu = 1\) and appears very symmetric. The sample means rarely stray far from the true population mean.

Key Insights

Several crucial patterns emerge:

  1. The sample mean targets the population mean: All sampling distributions center around \(\mu = 1\), regardless of sample size.

  2. Larger samples produce more precise estimates: As \(n\) increases, the sampling distribution becomes more concentrated around \(\mu\).

  3. Shape changes with sample size: Even though the population is highly skewed, the sampling distribution becomes more symmetric as \(n\) increases.

  4. The magic of averaging: By averaging multiple observations, we reduce the impact of extreme values and create estimators that behave much better than individual observations.

7.2.3. Deriving the Mathematical Properties

To deepen our understanding of the sample mean’s behavior, we derive its key distributional properties: the mean, variance, and standard deviation. For clarity, all parameter populations are written with a subscript \(X\) and all sampling distribution parameters with a subscript \(\bar{X}\).

Expected Value of the Sample Mean

\[\mu_{\bar{X}} = E[\bar{X}] = E\left[\frac{1}{n}\sum_{i=1}^n X_i\right] = \frac{1}{n} E\left[\sum_{i=1}^n X_i\right] = \frac{1}{n} \sum_{i=1}^n E[X_i]\]

Since all \(X_i\)’s come from the same distribution with \(E[X_i] = \mu_X\),

\[E[\bar{X}] = \frac{1}{n} \sum_{i=1}^n \mu = \frac{1}{n} \cdot n\mu_X = \mu_X\]

Unbiasedness of Sample Mean

The expected value of the sample mean equals the population mean (\(\mu_{\bar{X}} = \mu_X\)). When an estimator equals its target on average, we call it an unbiased estimator. Individual sample means may be too high or too low, but they center around the correct target.

Variance of the Sample Mean

\[\begin{split}&\sigma^2_{\bar{X}}=\text{Var}(\bar{X})\\ &=\text{Var}\left(\frac{1}{n}\sum_{i=1}^n X_i\right) = \frac{1}{n^2} \text{Var}\left(\sum_{i=1}^n X_i\right)\end{split}\]

Since the \(X_i\)’s are independent, the variance of the sum equals the sum of the variances. Also, all \(X_i\)’s have the same variance \(\sigma_X^2\):

\[\text{Var}(\bar{X}) = \frac{1}{n^2} \sum_{i=1}^n\text{Var}(X_i) = \frac{1}{n^2} \cdot n\sigma^2_X = \frac{\sigma^2_X}{n}\]

Standard Error of the Sample Mean

We call the standard deviation of the sample mean the standard error. It is:

\[\sigma_{\bar{X}} = \sqrt{\text{Var}(\bar{X})} = \frac{\sigma_X}{\sqrt{n}}\]

Understanding the Standard Error

Consider the variability of individual observations versus sample means:

  • Individual observations have standard deviation \(\sigma_X\)

  • Sample means have standard error \(\frac{\sigma_X}{\sqrt{n}}\)

For even modest sample sizes, sample means are much less variable than individual observations. With \(n = 25\), the sample mean has standard error \(\frac{\sigma}{5}\), making it five times more precise than any single observation.

This concentration effect explains why averaging is such a powerful statistical technique and why larger samples are better. By combining information from multiple observations, we create estimators that are much more reliable than any individual measurement.

Summary of Basic Distributional Properties of \(\bar{X}\)

Name

Notation

Formula

Expected Value

\(E[\bar{X}]\) or \(\mu_{\bar{X}}\)

\(\mu_X\)

Variance

\(\text{Var}(\bar{X})\) or \(\sigma_{\bar{X}}^2\)

\(\frac{\sigma_X^2}{n}\)

Standard Error

\(\sigma_{\bar{X}}\)

\(\frac{\sigma_X}{\sqrt{n}}\)

These results hold regardless of the shape of the population distribution, as long as \(μ\) and \(σ\) are finite.

Example💡: Maze Navigation Times

Researchers study how long it takes rats of a certain subspecies to navigate through a standardized maze. Previous research suggests that navigation times have a mean \(\mu_X = 1.5\) minutes and a standard deviation \(\sigma_X=0.35\) minutes.

The researchers select five rats at random and want to understand the behavior of the average navigation time for their sample. What are the mean and the standard error of the sampling distribution for the sample mean?

Setting Up the Problem

We have:

  • \(X_i\) are iid with \(E[X_i]=1.5\) and \(\text{Var}(X_i)=0.35^2\) for each \(i \in \{1,2,3,4,5\}\)

  • \(n = 5\)

Mean of the Sample Mean

\[\mu_{\bar{X}} = \mu_X = 1.5 \text{ minutes}\]

Standard Error of the Sample Mean

\[\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{0.35}{\sqrt{5}} = \frac{0.35}{2.236} = 0.1565 \text{ minutes}\]

7.2.4. The Special Case: Normal Populations

While our mathematical results apply to any population with finite mean and variance, there’s one special case where we can say much more about the shape of the sampling distribution: when the population follows a normal distribution.

Linear Combinations of Normal Random Variables

A key property of normal distributions is that linear combinations of normal random variables are themselves normal. That is, if \(X\) and \(Y\) are independent normal random variables, then any linear combination of the form \(aX + bY + c\) is also normal.

The sample mean is exactly such a linear combination:

\[\bar{X} = \frac{1}{n}X_1 + \frac{1}{n}X_2 + \cdots + \frac{1}{n}X_n\]

The Exact Distribution of \(\bar{X}\) from Normal Population

If \(X_1, X_2, \cdots, X_n\) are iid from a normal distribution with mean \(\mu_X\) and standard deviation \(\sigma_X\), then:

\[\bar{X} \sim N\left(\mu_X, \frac{\sigma_X^2}{n}\right) \quad \text{ or equivalently,} \quad \bar{X} \sim N\left(\mu_X, \frac{\sigma_X}{\sqrt{n}}\right)\]

This result is remarkable because it tells us the exact sampling distribution, not just its mean and variance.

Example💡: Maze Navigation Times, Continued

Researchers study how long it takes rats of a certain subspecies to navigate through a standardized maze. In addition to the parameters \(\mu = 1.5\) minutes and \(\sigma = 0.35\) minutes, it is known that the population of navigation times follow normal distribution.

Setting Up the Problem

From the previous example, we have

  • \(\mu_{\bar{X}} = 1.5\)

  • \(\sigma_{\bar{X}} = 0.1565\)

Since the population follows normal distribution, the sampling distribution for the sample mean must also be normal. We have:

\[\bar{X} \sim N(1.5, 0.1565^2)\]

Computing Probabilities

What’s the probability that the average navigation time for five rats exceeds 1.75 minutes?

We need to find \(P(\bar{X} > 1.75)\). Since \(\bar{X} \sim N(1.5, 0.1565^2)\), we use the standardization technique and the Z-table (or a statistical software) to compute:

\[\begin{split}&P(\bar{X} > 1.75) = P\left(\frac{\bar{X} - 1.5}{0.1565} > \frac{1.75 - 1.5}{0.1565}\right)\\ &= P(Z > 1.60) = 1 - \Phi(1.60) = 1 - 0.9452 = 0.0548\end{split}\]

There’s about 0.0548 probability that the average navigation time for five randomly selected rats will exceed 1.75 minutes.

7.2.5. Bringing It All Together

Key Takeaways 📝

  1. The sample mean \(\bar{X}\) is a random variable with its own probability distribution, called the sampling distribution of the sample mean.

  2. Mathematical properties hold universally: \(E[\bar{X}] = \mu\) and \(Var(X̄) = σ²/n\) for any population with finite mean and variance.

  3. Normal populations produce exact results: If the population is \(X \sim N(\mu, \sigma^2)\), then \(\bar{X} \sim N(\mu, \sigma^2/n)\), enabling precise probability calculations.

  4. Sample means are more precise than individual observations by a factor of \(\sqrt{n}\), explaining why averaging is such a powerful statistical technique.

Exercises

  1. Basic Properties: A population has mean \(μ = 80\) and standard deviation \(σ = 12\). For samples of size \(n = 36\),

    1. find the mean and standard error of the sampling distribution of \(\bar{X}\).

    2. if the population is normal, what is the complete sampling distribution of \(\bar{X}\)?

    3. find \(P(\bar{X} > 82)\) assuming the population is normal.

  2. Standard Error Relationships: A researcher wants to estimate a population mean with a standard error of at most \(2.5\). The population standard deviation is \(\sigma = 20\).

    1. What is the minimum sample size is needed?

    2. If the sample size is doubled, what happens to the standard error?

  3. Maze Navigation Extended: Using the maze example (\(\mu = 1.5, \sigma = 0.35\)):

    1. Find the probability that the sample mean for \(n = 5\) rats falls between \(1.4\) and \(1.6\) minutes.

    2. How would this probability change if \(n = 20\) were used instead?

    3. Find the sample size needed so that \(P(|\bar{X} - 1.5| < 0.1) = 0.95\).