.. _7-2-sampling-distribution-for-the-sample-mean:
.. raw:: html
Sampling Distribution for the Sample Mean
========================================================
Having established that statistics are random variables with their own distributions, we now focus on the most
important statistic in all of statistical inference: the sample mean :math:`\bar{X}`.
.. admonition:: Road Map 🧭
:class: important
* View the **sample mean** :math:`\bar{X}` **as a random variable** that varies from sample to sample.
* Establish the **distributional properties of** :math:`\bar{X}`, such as :math:`E[\bar{X}]`
and :math:`\text{Var}(\bar{X})`, in relation to the population parameters :math:`\mu` and
:math:`\sigma`.
* Define the standard deviation of the sample mean as the **standard error** and understand
how it is influenced by the population standard deviation and sample size.
* Visually observe how the population distribution and the sample size **affects the overall
shape** of the sampling distribution for the sample mean.
A New Perspective on the Data-Generating Procedure
-----------------------------------------------------------
So far, we've pictured the sampling procedure from a population as drawing
individual datapoints :math:`n` different times from a single :math:`X` (left of
:numref:`new-sampling-framework`).
.. _new-sampling-framework:
.. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter7/new-sampling-framework.png
:alt: A new perspective of understanding how data points are sampled
:figwidth: 70%
:align: center
Left represents how we used to think of the sampling procedure;
we now think in the perspective on the right
For the formal understanding of the sampling distribution of :math:`\bar{X}`, we need to
begin with a new perspective. Imagine that there are :math:`n` **independent and identically distributed (iid)
copies of the population**, :math:`X_1, X_2, \cdots, X_n`, and a sample
is constructed by taking one data point from each copy (right of :numref:`new-sampling-framework`).
Through this shift, we can now express the sample mean :math:`\bar{X}` as a function of :math:`n` random variables:
.. math::
\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i
Why This Perspective Matters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The shift from viewing :math:`\bar{x}` as merely "the mean of my data" to recognizing
it as a single realization from a random variable :math:`\bar{X}` is crucial for statistical inference.
When we eventually compute a confidence interval or perform a hypothesis test, we'll be asking questions like:
- "If the population mean really is :math:`\mu_0`, how likely is it that we'd observe a sample mean as
extreme as the one we got?"
- "What range of population means would be consistent with the sample mean we observed?"
These questions only make sense if we understand how :math:`\bar{X}` behaves probabilistically.
Visualizing Sampling Distributions
------------------------------------------------------------
Let's explore sampling distributions through a concrete example that reveals how sample size affects
the behavior of our estimators.
The Population: Exponential Distribution
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider a population that follows an exponential distribution with parameter :math:`\lambda = 1`.
Recall that this distribution is highly right-skewed, with most values bunched near 0 and a
long tail extending to the right. The population mean is :math:`\mu = 1/\lambda = 1`, and the population
standard deviation is :math:`\sigma = 1/\lambda = 1`.
.. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter7/exponential-pdf.png
:alt: The pdf of exponential distribution
:width: 70%
:align: center
The exponential population: highly right-skewed with mean :math:`\mu=1`
When we conduct statistical inference in practice, we won't know the population follows an
exponential distribution or what its parameter value is. But for learning about sampling
distributions, we'll assume this knowledge so we can compare our sample results to the known truth.
Sampling with n = 5
~~~~~~~~~~~~~~~~~~~~~~~
Let's start by taking samples of size :math:`n = 5` from this exponential
population. Using R, we can simulate this process:
.. code-block:: r
# Take one sample of size 5
sample1 <- rexp(5, rate = 1)
sample_mean1 <- mean(sample1)
# Result: 0.39
Each sample gives us a different sample mean. To understand the sampling distribution,
we repeat this process many times—say, one million times—and examine how these sample means are distributed.
.. code-block:: r
# Simulate the sampling distribution
num_samples <- 1000000
n <- 5
sample_means <- replicate(num_samples, mean(rexp(n, rate = 1)))
When we plot the distribution of these million sample means, we see something remarkable.
The distribution no longer looks like the original exponential distribution. It's still somewhat right-skewed,
but the degree of skewness has diminished. The sample means cluster more tightly around the true
population mean :math:`\mu=1`.
.. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter7/sampling-dist-n5.png
:alt: Histogram of sample means when :math:`n=5`
:width: 70%
:align: center
The Effect of Increasing Sample Size
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. admonition:: Example 💡:
:class: note
.. admonition:: Example 💡:
:class: note
.. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter7/.png:width: 70%
:align: center:alt:
Now let's see what happens when we increase the sample size to :math:`n = 25`:
.. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter7/sampling-dist-n25.png
:alt: Histogram of sample means when :math:`n=25`
:width: 70%
:align: center
The transformation is dramatic. The sampling distribution is now roughly symmetric and centered around :math:`\mu = 1`.
It bears little resemblance to the original exponential population. The sample means are much more concentrated
around the true value—most fall between 0.5 and 1.5.
With :math:`n = 65`, the pattern becomes even more pronounced:
.. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter7/sampling-dist-n65.png
:alt: Histogram of sample means when :math:`n=65`
:width: 70%
:align: center
Now the distribution is highly concentrated around :math:`\mu = 1` and appears very symmetric.
The sample means rarely stray far from the true population mean.
Key Insights
~~~~~~~~~~~~~~~
Several crucial patterns emerge:
1. **The sample mean targets the population mean**: All sampling distributions center around :math:`\mu = 1`,
regardless of sample size.
2. **Larger samples produce more precise estimates**: As :math:`n` increases, the sampling distribution becomes
more concentrated around :math:`\mu`.
3. **Shape changes with sample size**: Even though the population is highly skewed, the sampling distribution
becomes more symmetric as :math:`n` increases.
4. **The magic of averaging**: By averaging multiple observations, we reduce the impact of extreme values and
create estimators that behave much better than individual observations.
Deriving the Mathematical Properties
---------------------------------------
To deepen our understanding of the sample mean's behavior, we derive its key distributional properties:
the mean, variance, and standard deviation. For clarity,
all parameter populations are written with a subscript :math:`X` and all
sampling distribution parameters with a subscript :math:`\bar{X}`.
Expected Value of the Sample Mean
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. math::
\mu_{\bar{X}} = E[\bar{X}] = E\left[\frac{1}{n}\sum_{i=1}^n X_i\right]
= \frac{1}{n} E\left[\sum_{i=1}^n X_i\right] = \frac{1}{n} \sum_{i=1}^n E[X_i]
Since all :math:`X_i`'s come from the same distribution with :math:`E[X_i] = \mu_X`,
.. math::
E[\bar{X}] = \frac{1}{n} \sum_{i=1}^n \mu = \frac{1}{n} \cdot n\mu_X = \mu_X
Unbiasedness of Sample Mean
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The **expected value of the sample mean equals the population mean** (:math:`\mu_{\bar{X}} = \mu_X`).
When an estimator equals its target on average, we call it an **unbiased** estimator.
Individual sample means may be too high or too low, but they center around the correct target.
Variance of the Sample Mean
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. math::
&\sigma^2_{\bar{X}}=\text{Var}(\bar{X})\\
&=\text{Var}\left(\frac{1}{n}\sum_{i=1}^n X_i\right)
= \frac{1}{n^2} \text{Var}\left(\sum_{i=1}^n X_i\right)
Since the :math:`X_i`'s are independent, the variance of the sum equals the sum of the variances.
Also, all :math:`X_i`'s have the same variance :math:`\sigma_X^2`:
.. math::
\text{Var}(\bar{X}) = \frac{1}{n^2} \sum_{i=1}^n\text{Var}(X_i)
= \frac{1}{n^2} \cdot n\sigma^2_X = \frac{\sigma^2_X}{n}
Standard Error of the Sample Mean
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We call the standard deviation of the sample mean the **standard error**. It is:
.. math::
\sigma_{\bar{X}} = \sqrt{\text{Var}(\bar{X})} = \frac{\sigma_X}{\sqrt{n}}
Understanding the Standard Error
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Consider the variability of individual observations versus sample means:
- **Individual observations** have standard deviation :math:`\sigma_X`
- **Sample means** have standard error :math:`\frac{\sigma_X}{\sqrt{n}}`
For even modest sample sizes, **sample means are much less variable** than individual observations.
With :math:`n = 25`, the sample mean has standard error :math:`\frac{\sigma}{5}`, making it five times
more precise than any single observation.
This concentration effect explains why averaging is such a powerful statistical technique
and why larger samples are better.
By combining information from multiple observations, we create estimators that are much more
reliable than any individual measurement.
Summary of Basic Distributional Properties of :math:`\bar{X}`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. flat-table::
:header-rows: 1
:align: center
:width: 90%
* - Name
- Notation
- Formula
* - **Expected Value**
- :math:`E[\bar{X}]` or :math:`\mu_{\bar{X}}`
- :math:`\mu_X`
* - **Variance**
- :math:`\text{Var}(\bar{X})` or :math:`\sigma_{\bar{X}}^2`
- :math:`\frac{\sigma_X^2}{n}`
* - **Standard Error**
- :math:`\sigma_{\bar{X}}`
- :math:`\frac{\sigma_X}{\sqrt{n}}`
These results hold regardless of the shape of the population distribution, as long as :math:`μ` and :math:`σ` are finite.
.. admonition:: Example💡: Maze Navigation Times
:class: note
Researchers study how long it takes rats of a certain subspecies to navigate through a standardized maze.
Previous research suggests that navigation times have a mean :math:`\mu_X = 1.5` minutes and a
standard deviation :math:`\sigma_X=0.35` minutes.
The researchers select five rats at random and want to understand the **behavior of the average navigation
time for their sample**. What are the mean and the standard error of the sampling distribution for the
sample mean?
**Setting Up the Problem**
We have:
- :math:`X_i` are *iid* with :math:`E[X_i]=1.5` and :math:`\text{Var}(X_i)=0.35^2`
for each :math:`i \in \{1,2,3,4,5\}`
- :math:`n = 5`
**Mean of the Sample Mean**
.. math::
\mu_{\bar{X}} = \mu_X = 1.5 \text{ minutes}
**Standard Error of the Sample Mean**
.. math::
\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{0.35}{\sqrt{5}} = \frac{0.35}{2.236} = 0.1565 \text{ minutes}
The Special Case: Normal Populations
---------------------------------------
While our mathematical results **apply to any population with finite mean and variance**,
there's **one special case where we can say much more** about the shape of the sampling
distribution: when the population follows a normal distribution.
Linear Combinations of Normal Random Variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A key property of normal distributions is that **linear combinations of normal random
variables are themselves normal**. That is, if :math:`X` and :math:`Y` are independent
normal random variables, then any linear combination of the form :math:`aX + bY + c` is also normal.
The sample mean is exactly such a linear combination:
.. math::
\bar{X} = \frac{1}{n}X_1 + \frac{1}{n}X_2 + \cdots + \frac{1}{n}X_n
The Exact Distribution of :math:`\bar{X}` from Normal Population
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If :math:`X_1, X_2, \cdots, X_n` are iid from a normal distribution with mean
:math:`\mu_X` and standard deviation :math:`\sigma_X`, then:
.. math::
\bar{X} \sim N\left(\mu_X, \frac{\sigma_X^2}{n}\right) \quad \text{ or equivalently,} \quad
\bar{X} \sim N\left(\mu_X, \frac{\sigma_X}{\sqrt{n}}\right)
This result is remarkable because it tells us the **exact** sampling distribution,
not just its mean and variance.
.. admonition:: Example💡: Maze Navigation Times, Continued
:class: note
Researchers study how long it takes rats of a certain subspecies to navigate through a standardized maze.
In addition to the parameters :math:`\mu = 1.5` minutes and :math:`\sigma = 0.35` minutes,
it is known that **the population of navigation times follow normal distribution**.
**Setting Up the Problem**
From the previous example, we have
- :math:`\mu_{\bar{X}} = 1.5`
- :math:`\sigma_{\bar{X}} = 0.1565`
Since the population follows normal distribution, the sampling distribution for
the sample mean must also be normal. We have:
.. math:: \bar{X} \sim N(1.5, 0.1565^2)
**Computing Probabilities**
What's the probability that the average navigation time for five rats exceeds 1.75 minutes?
We need to find :math:`P(\bar{X} > 1.75)`. Since :math:`\bar{X} \sim N(1.5, 0.1565^2)`, we use
the standardization technique and the Z-table (or a statistical software) to compute:
.. math::
&P(\bar{X} > 1.75) = P\left(\frac{\bar{X} - 1.5}{0.1565} > \frac{1.75 - 1.5}{0.1565}\right)\\
&= P(Z > 1.60) = 1 - \Phi(1.60) = 1 - 0.9452 = 0.0548
There's about 0.0548 probability that the average navigation time for five randomly selected
rats will exceed 1.75 minutes.
Bringing It All Together
-----------------------------
.. admonition:: Key Takeaways 📝
:class: important
1. **The sample mean** :math:`\bar{X}` **is a random variable** with its own probability distribution,
called the sampling distribution of the sample mean.
2. **Mathematical properties hold universally**: :math:`E[\bar{X}] = \mu` and :math:`Var(X̄) = σ²/n`
for any population with finite mean and variance.
3. **Normal populations produce exact results**: If the population is :math:`X \sim N(\mu, \sigma^2)`,
then :math:`\bar{X} \sim N(\mu, \sigma^2/n)`, enabling precise probability calculations.
4. **Sample means are more precise than individual observations** by a factor of :math:`\sqrt{n}`, explaining why
averaging is such a powerful statistical technique.
Exercises
~~~~~~~~~~~~~~~~~
1. **Basic Properties**: A population has mean :math:`μ = 80` and standard deviation :math:`σ = 12`.
For samples of size :math:`n = 36`,
a) find the mean and standard error of the sampling distribution of :math:`\bar{X}`.
b) if the population is normal, what is the complete sampling distribution of :math:`\bar{X}`?
c) find :math:`P(\bar{X} > 82)` assuming the population is normal.
2. **Standard Error Relationships**: A researcher wants to estimate a population mean with
a standard error of at most :math:`2.5`. The population standard deviation is :math:`\sigma = 20`.
a) What is the minimum sample size is needed?
b) If the sample size is doubled, what happens to the standard error?
3. **Maze Navigation Extended**: Using the maze example (:math:`\mu = 1.5, \sigma = 0.35`):
a) Find the probability that the sample mean for :math:`n = 5` rats falls between :math:`1.4`
and :math:`1.6` minutes.
b) How would this probability change if :math:`n = 20` were used instead?
c) Find the sample size needed so that :math:`P(|\bar{X} - 1.5| < 0.1) = 0.95`.