.. _7-1-statistics-and-sampling-distributions:

.. raw:: html

   <div class="video-placeholder" role="group" aria-labelledby="video-ch7-1">
      <iframe
         id="video-ch7-1"
         title="STAT 350 – Chapter 7.1 Statistics and Sampling Distributions Video"
         src="https://www.youtube.com/embed/YLCUV0h1mRA?list=PLHKEwTHXfbagA3ybKLAcEJriGT-6k89c6"
         allowfullscreen>
      </iframe>
   </div>

Statistics and Sampling Distributions
====================================================

We've spent considerable time building probability models that describe populations. Now we reach a pivotal 
moment in our statistical journey: the bridge from probability theory to statistical inference. 
Instead of knowing the population distribution, we'll work backward from sample data to make conclusions 
about unknown populations. This transition requires understanding **how sample statistics themselves behave 
as random variables**.

.. admonition:: Road Map 🧭
   :class: important

   * Understand how **sample statistics can be viewed as random variables**.
   * Master the **theoretical properties of sample statistics** in relation to the population distribution.

Parameters, Statistics, and the Bridge to Inference
----------------------------------------------------

Before diving into sampling distributions, let us review the fundamental vocabulary.

Population Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~

A **parameter** is a number that describes some attribute of the population.
For example, the population mean :math:`\mu` is a parameter that tells us the average value across *all units in 
the population*. These parameters are often unknown in practice, but they always exist as **fixed, non-random values**.

When we study probability distributions, we often assume we know these parameters to 
understand theoretical behavior. But in statistical inference, the tables turn—we 
observe sample data and try to learn about these unknown population characteristics.

Sample Statistics: Our Window Into the Population
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A **statistic** :math:`T` is a function that maps each sample to a numerical summary :math:`t`:

.. math:: T(x_1, x_2, \cdots, x_n) = t 

One example is the sample mean :math:`\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i`, which summarizes the 
center of an observed sample. Unlike parameters, we can calculate statistics directly 
from the data.

The crucial insight is that **statistics will change from sample to sample**. If we collect a 
new sample of the same size from the same population, we'll almost certainly get **different values for**
:math:`\bar{x}`. 

Estimators: Statistics with a Purpose
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

An **estimator** is a statistic which **targets a specific
population parameter**. We use the sample mean :math:`\bar{x}` as an estimator of the population mean :math:`\mu`. 
We use the sample standard deviation :math:`s` as an estimator of the population standard deviation :math:`\sigma`. 

The sample mean :math:`\bar{x}` is both a statistic (it summarizes our sample) 
and an estimator (it targets the population mean :math:`\mu`). This dual role reflects the bridge between 
describing what we observe and inferring what we cannot directly measure.

.. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter7/pop-stat-est-inf.png
   :figwidth: 80%
   :align: center
   
   *Parameters describe the population; statistics summarize the sample; estimators target the unknown parameters*

The Sampling Distribution
--------------------------------------------------------------------

To understand **how estimators behave across many 
possible samples**, we must establish the concept of a sampling distribution.

The Thought Experiment
~~~~~~~~~~~~~~~~~~~~~~~

Imagine we could repeatedly sample from the same population, computing a statistic each 
time. Suppose we're studying the heights of college students and we take samples of size 
:math:`n = 25`. We collect our first sample, measure all :math:`25` students, and compute :math:`\bar{x}_1 = 67.2` inches. 
We collect a second sample of :math:`25` different students and get :math:`\bar{x}_2 = 68.8` inches. A third sample 
gives :math:`\bar{x}_3 = 66.9` inches.

If we repeated this process thousands of times, we'd have thousands of different sample means: 
:math:`\bar{x}_1, \bar{x}_2, \cdots, \bar{x}_m`. These sample means would **vary**, 
and that variation would follow a pattern. 
The distribution of these sample means is called the **sampling distribution of the sample mean**.

Formal Definition
~~~~~~~~~~~~~~~~~~~~

The **sampling distribution** of a statistic is the probability distribution of that statistic 
across all possible samples of the same size from the same population. It tells us **how the statistic 
behaves as a random variable**.

This concept applies to any statistic—sample means, sample standard deviations, sample medians, 
sample correlations. Each has its own sampling distribution that describes how that particular 
statistic varies from sample to sample.

Why This Matters for Inference
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Understanding sampling distributions allows us to quantify the uncertainty in our estimates. 
If we know how :math:`\bar{X}` behaves across many samples, we can assess how reliable any single observed 
:math:`\bar{x}` might be as an estimate of :math:`\mu`. This knowledge forms the foundation for 
confidence intervals, hypothesis tests, and all other inferential procedures.

.. admonition:: The Capital :math:`\bar{X}`
   :class: important

   We will now study the behavior of sample statistics as random variables.
   To distinguish from contexts where they serve as realized values, we use
   capital letters. :math:`\bar{X}` denotes the random variable that
   generates :math:`\bar{x}`'s.
   Similar distinctions apply to :math:`S` vs. :math:`s` and :math:`S^2` vs. :math:`s^2`.


Factors Affecting Sampling Distributions
-------------------------------------------

1. **The population distribution**
2. **Sample size**, :math:`n`
3. **The statistic itself**
   
   For example, :math:`S` can only take positive values, while :math:`\bar{X}` has no such
   restriction. This is due to differences in how the two statistics are computed. 

4. **The sampling technique**

   The sampling technique affects how well a sample represents the population and whether 
   key properties like independence are satisfied.

We will examine how these factors influence inference in greater depth in the upcoming chapters.

Bringing It All Together
---------------------------

.. admonition:: Key Takeaways 📝
   :class: important

   1. **Statistics are random variables** that vary from sample to sample, creating sampling distributions 
      that describe this variability. Understanding how **statistics behave as random variables** lets us quantify 
      uncertainty and make conclusions about unknown populations.
   
   2. **Multiple factors affect sampling distributions**: population shape, sample size, 
      the choice of statistic, and independence all play important roles.

Exercises
~~~~~~~~~~~~~

1. **Conceptual Understanding**: Explain the difference between a parameter, a statistic, and an estimator.
   Give an example of each in the context of measuring the average commute time for workers in your city.