7.2. Sampling Distribution for the Sample Mean

Having established that statistics are random variables with their own distributions, we now focus on the most important statistic in all of statistical inference: the sample mean \(\bar{X}\).

Road Map 🧭

  • View the sample mean \(\bar{X}\) as a function of \(n\) independent and identically distributed random variables.

  • Establish \(E[\bar{X}]\) and \(\text{Var}(\bar{X})\) in relation to the distributional properties of these building blocks.

  • Define the standard deviation \(\sigma_{\bar{X}}\) of the sample mean as the standard error and understand how it is influenced by the population standard deviation and sample size.

7.2.1. A New Perspective on the Data-Generating Procedure

So far, we’ve pictured the sampling procedure as drawing individual datapoints \(n\) different times from a single random variable \(X\) (left of Fig. 7.2).

A new perspective of understanding how data points are sampled

Fig. 7.2 Left represents how we used to think of the sampling procedure; we now think in the perspective on the right

For the formal understanding of the sampling distribution of \(\bar{X}\), we need to begin with a new perspective. Imagine that there are \(n\) independent and identically distributed (iid) copies of the population, \(X_1, X_2, \cdots, X_n\), and a sample is constructed by taking one data point from each copy (right of Fig. 7.2).

Through this shift, we can now express the sample mean \(\bar{X}\) as a function of \(n\) random variables:

\[\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i\]

This allows us to break down the properties of the random variable \(\bar{X}\) in terms of its building blocks \(X_1, X_2, \cdots, X_n\), with which we are more familiar.

7.2.2. Visualizing Sampling Distributions

Let’s get a feel for how sampling distributions behave with a concrete visual example.

The Population: Exponential Distribution

Consider a population that follows an exponential distribution with parameter \(\lambda = 1\). Recall that this distribution is highly right-skewed, with most values bunched near 0 and a long tail extending to the right. The population mean is \(\mu = 1/\lambda = 1\), and the population standard deviation is \(\sigma = 1/\lambda = 1\).

The pdf of exponential distribution

Fig. 7.3 The exponential population: highly right-skewed with mean \(\mu=1\)

When we conduct statistical inference in practice, we won’t know the population follows an exponential distribution or what its parameter value is. For now, we’ll assume this knowledge so we can compare our sample results to the known truth.

Sampling with n = 5

Let’s start by taking one sample of size \(n = 5\) from this population. The code below samples five numbers from the population, and computes the average:

# Take one sample of size 5
sample1 <- rexp(5, rate = 1)
sample_mean1 <- mean(sample1)
# Result: 0.39

We repeat this process many times (num_samples). Each repetition samples a different set of five numbers and thus produces a different sample mean.

# Simulate the sampling distribution
num_samples <- 1000000
n <- 5
sample_means <- replicate(num_samples, mean(rexp(n, rate = 1)))

When we plot the distribution of these million sample means, we see something remarkable. The distribution no longer looks like the original exponential distribution. It’s still somewhat right-skewed, but the degree of skewness has diminished. The sample means cluster more tightly around the true population mean \(\mu=1\).

Histogram of sample means when :math:`n=5`

The Effect of Increasing Sample Size

Now let’s see what happens when we increase the sample size to \(n = 25\):

Histogram of sample means when :math:`n=25`

The transformation is dramatic. The sampling distribution is now roughly symmetric and centered around \(\mu = 1\). It bears little resemblance to the original exponential population. The sample means are much more concentrated around the true value—most fall between 0.5 and 1.5.

With \(n = 65\), the pattern becomes even more pronounced:

Histogram of sample means when :math:`n=65`

Now the distribution is highly concentrated around \(\mu = 1\) and appears very symmetric. The sample means rarely stray far from the true population mean.

Key Insights

  1. The sample mean targets the population mean: All sampling distributions center around \(\mu = 1\), regardless of sample size.

  2. Larger samples produce more precise estimates: As \(n\) increases, the sampling distribution becomes more concentrated around \(\mu\).

  3. Shape changes with sample size: Even though the population is highly skewed, the sampling distribution becomes more symmetric as \(n\) increases.

  4. The magic of averaging: By averaging multiple observations, we reduce the impact of extreme values and create estimators that behave better than individual observations.

7.2.3. Deriving the Mathematical Properties

To deepen our understanding of the sample mean’s behavior, we derive its key distributional properties: the mean, variance, and standard deviation. For clarity, all population parameters are written with a subscript \(X\) and all sampling distribution parameters with a subscript \(\bar{X}\).

A. Expected Value of the Sample Mean

\[\mu_{\bar{X}} = E[\bar{X}] = E\left[\frac{1}{n}\sum_{i=1}^n X_i\right] = \frac{1}{n} E\left[\sum_{i=1}^n X_i\right] = \frac{1}{n} \sum_{i=1}^n E[X_i]\]

Since all \(X_i\)’s come from the same distribution with \(E[X_i] = \mu_X\),

\[E[\bar{X}] = \frac{1}{n} \sum_{i=1}^n \mu = \frac{1}{n} \cdot n\mu_X = \mu_X\]

Unbiasedness of Sample Mean

The expected value of the sample mean equals the population mean (\(\mu_{\bar{X}} = \mu_X\)). When an estimator equals its target on average, we call it an unbiased estimator. Individual sample means may be too high or too low, but they center around the correct target.

B. Variance and Standard Error of the Sample Mean

\[\sigma^2_{\bar{X}}=\text{Var}(\bar{X}) =\text{Var}\left(\frac{1}{n}\sum_{i=1}^n X_i\right) = \frac{1}{n^2} \text{Var}\left(\sum_{i=1}^n X_i\right)\]

Since the \(X_i\)’s are independent, the variance of the sum equals the sum of the variances. Also, all \(X_i\)’s have the same variance \(\sigma_X^2\):

\[\text{Var}(\bar{X}) = \frac{1}{n^2} \sum_{i=1}^n\text{Var}(X_i) = \frac{1}{n^2} \cdot n\sigma^2_X = \frac{\sigma^2_X}{n}\]

We call the standard deviation of the sample mean the standard error. It is the positive square root of the variance of \(\bar{X}\).

\[\sigma_{\bar{X}} = \sqrt{\text{Var}(\bar{X})} = \frac{\sigma_X}{\sqrt{n}}\]

Understanding the Standard Error

For even modest sample sizes, sample means are much less variable than individual observations. With \(n = 25\), for example, the sample mean has standard error \(\frac{\sigma}{5}\), making it five times more precise than any single observation with standard deviation \(\sigma\).

This concentration effect explains why averaging is such a powerful statistical technique and why larger samples are usually better. By combining information from multiple observations, we create estimators that are more reliable than an individual measurement.

C. Summary of Basic Distributional Properties of \(\bar{X}\)

Name

Notation

Formula

Expected Value

\(E[\bar{X}]\) or \(\mu_{\bar{X}}\)

\(\mu_X\)

Variance

\(\text{Var}(\bar{X})\) or \(\sigma_{\bar{X}}^2\)

\(\frac{\sigma_X^2}{n}\)

Standard Error

\(\sigma_{\bar{X}}\)

\(\frac{\sigma_X}{\sqrt{n}}\)

Example💡: Maze Navigation Times

Researchers study how long it takes rats of a certain subspecies to navigate through a standardized maze. Previous research suggests that navigation times have a mean \(\mu_X = 1.5\) minutes and a standard deviation \(\sigma_X=0.35\) minutes.

The researchers select five rats at random and want to understand the behavior of the average navigation time for their sample. What are the mean and the standard error of the sampling distribution for the sample mean?

Setting Up the Problem

We have:

  • \(X_i\) are iid with \(E[X_i]=1.5\) and \(\text{Var}(X_i)=0.35^2\) for each \(i \in \{1,2,3,4,5\}\)

  • \(n = 5\)

Mean of the Sample Mean

\[\mu_{\bar{X}} = \mu_X = 1.5 \text{ minutes}\]

Standard Error of the Sample Mean

\[\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{0.35}{\sqrt{5}} = \frac{0.35}{2.236} = 0.1565 \text{ minutes}\]

7.2.4. The Special Case: Normal Populations

While our mathematical results apply to any population with finite mean and variance, there’s one special case where we can say much more about the shape of the sampling distribution: when the population follows a normal distribution.

Linear Combinations of Normal Random Variables

A key property of normal distributions is that linear combinations of normal random variables are themselves normal. That is, if \(X\) and \(Y\) are normal random variables, then any linear combination of the form \(aX + bY + c\) is also normal.

The sample mean is exactly such a linear combination:

\[\bar{X} = \frac{1}{n}X_1 + \frac{1}{n}X_2 + \cdots + \frac{1}{n}X_n\]

The Exact Distribution of \(\bar{X}\) from Normal Population

If \(X_1, X_2, \cdots, X_n\) are iid from a normal distribution with mean \(\mu_X\) and standard deviation \(\sigma_X\), then:

\[\bar{X} \sim N\left(\mu_X, \frac{\sigma_X^2}{n}\right) \quad \text{ or equivalently,} \quad \bar{X} \sim N\left(\mu_X, \frac{\sigma_X}{\sqrt{n}}\right)\]

This result is remarkable because it tells us the exact sampling distribution, not just its mean and variance.

Example💡: Maze Navigation Times, Continued

Researchers study how long it takes rats of a certain subspecies to navigate through a standardized maze. In addition to the parameters \(\mu = 1.5\) minutes and \(\sigma = 0.35\) minutes, it is known that the population of navigation times follow normal distribution.

Setting Up the Problem

From the previous example, we have

  • \(\mu_{\bar{X}} = 1.5\)

  • \(\sigma_{\bar{X}} = 0.1565\)

Since the population follows normal distribution, the sampling distribution for the sample mean must also be normal. We have:

\[\bar{X} \sim N(1.5, 0.1565^2)\]

Computing Probabilities

What’s the probability that the average navigation time for five rats exceeds 1.75 minutes?

We need to find \(P(\bar{X} > 1.75)\). Since \(\bar{X} \sim N(1.5, 0.1565^2)\), we use the standardization technique and the Z-table (or a statistical software) to compute:

\[\begin{split}&P(\bar{X} > 1.75) = P\left(\frac{\bar{X} - 1.5}{0.1565} > \frac{1.75 - 1.5}{0.1565}\right)\\ &= P(Z > 1.60) = 1 - \Phi(1.60) = 1 - 0.9452 = 0.0548\end{split}\]

There’s about 0.0548 probability that the average navigation time for five randomly selected rats will exceed 1.75 minutes.

7.2.5. Additional Example: Quality Control in Manufacturing

Let us conclude this section by solving a problem applying the CLT to decision-making in in quality control.

Example 💡: Quality Control in Manufacturing

The Bulls Eye Production company manufactures a number of high precision tools. Under the usual production process, one of these tools has a mean diameter of 5mm. The measurement varies normally aroud this mean, with standard deviation of 0.5mm.

However, the machine will need to be frequently recalibrated due to the strenuous operating conditions. Recalibration is required anytime the difference between the observed sample mean diameter and the ideal diameter is too large. “Large” is measured probabilistically; if the probability of the deviation is rarer than 0.05, then the difference is considered too large.

A random sample of size 64 is taken to assess the need for recalibration. It is found that the average diameter of the sample is 4.85mm. Is recalibration necessary?

Setting Up the Problem

It is given that

  • \(n=64\)

  • \(\mu_X = 5\) and \(\sigma_X = 0.5\)

  • The population is normally distributed.

  • The sample is randomly collected from the same population, which allows us to assume the iid condition.

  • A single realization from \(\bar{X}\) has value \(\bar{x} = 4.85\).

Solving the Problem

We must compute a probability representing how rare the current difference \(|\bar{x}-\mu_X|\) is when compared with the general behavior of \(|\bar{X}-\mu_X|\).

\[\begin{split}&P(|\bar{X}-\mu_X| > |\bar{x}-\mu_X|)\\ &= P(|\bar{X}-\mu_X| > |4.85-5|)\\ &=P(\Bigg|\frac{\bar{X}-\mu_X}{\sigma_X/\sqrt{n}}\Bigg| > \frac{|4.85-5|}{0.5/\sqrt{64}})\\ &=P(|Z| > 2.4) = P(Z > 2.4) + P(Z < -2.4)\\ &=\underbrace{2P(Z < -2.4)}_{\text{by symmetry around } 0} = 0.0164\end{split}\]

The probability of seeing an even larger difference than the current observation is only 0.0164, which is smaller than 0.05. Therefore, the machine must be recalibrated.

7.2.6. Bringing It All Together

Key Takeaways 📝

  1. The sample mean \(\bar{X}\) is a random variable. Its probability distribution is called the sampling distribution of the sample mean.

  2. If the population has mean \(\mu_X\) and variance \(\sigma_X^2\), then \(\mu_{\bar{X}} = E[\bar{X}] = \mu_X\) and \(\sigma^2_{\bar{X}} = Var(\bar{X}) = \sigma_X^2/n\).

  3. If the population has a distribution \(N(\mu_X, \sigma_X^2)\), then the sampling distribution of \(\bar{X}\) is completely known: \(\bar{X} \sim N(\mu_X, \sigma_X^2/n)\).

Exercises

  1. Basic Properties: A population has mean \(μ = 80\) and standard deviation \(σ = 12\). For samples of size \(n = 36\),

    1. find the mean and standard error of the sampling distribution of \(\bar{X}\).

    2. if the population is normal, what is the complete sampling distribution of \(\bar{X}\)?

    3. find \(P(\bar{X} > 82)\) assuming the population is normal.

  2. Standard Error Relationships: A researcher wants to estimate a population mean with a standard error of at most \(2.5\). The population standard deviation is \(\sigma = 20\).

    1. What is the minimum sample size is needed?

    2. If the sample size is doubled, what happens to the standard error?

  3. Maze Navigation Extended: Using the maze example (\(\mu = 1.5, \sigma = 0.35\)):

    1. Find the probability that the sample mean for \(n = 5\) rats falls between \(1.4\) and \(1.6\) minutes.

    2. How would this probability change if \(n = 20\) were used instead?

    3. Find the sample size needed so that \(P(|\bar{X} - 1.5| < 0.1) = 0.95\).