.. _1-2-probability-inference:


.. raw:: html

   <div class="video-placeholder">
     <iframe
       src="https://www.youtube.com/embed/PE0EBtI4ffk?list=PLHKEwTHXfbagA3ybKLAcEJriGT-6k89c6"
       allowfullscreen>
     </iframe>
   </div>


Probability & Statistical Inference: How Are They Associated?
===================================================================================

This course can be split into two main parts. The first half considers probability,
and the second half focuses on statistical inference. In this section, we discuss
how these two broad concepts are **different** yet **closely associated**. We achieve
this goal by expressing each with a one-sided arrow between **population** and **sample**.
We begin by defining what **population** and **sample** are.

.. admonition:: Road Map 🧭
  :class: important
   
  * Define **population** and **sample**, and use their concepts to distinguish between
    **probability** and **statistical inference**.

Population and Sample
------------------------------------------------------------------


**Population** represents the entire collection of individuals or objects considered in our study. 
We condense information about the population into a probabilistic model that describes the behavior 
of attributes with some kind of smooth functional relationship.

**Sample** is just a subset of the entire population—a small selection of individuals or objects 
taken from the complete collection.

The "Two Arrows" Diagram
---------------------------------------------

.. _two-arrows-diagram:
.. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter1/population-vs-sample.png
  :alt: Population to Sample (probability) and Sample to Population (inference)
  :figwidth: 70%
  :align: center

  The "two arrows" diagram

We express *probability* and *statistical inference* using
one-sided arrows between *sample* and *population*. In each case, we begin with
the complete knowledge of the side from which the arrow originates. Using this knowledge,
we try to make an informed guess about the side the arrow points to.

Probability: Population → Sample
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

What does it mean to have **complete knowledge** of the population? It means that we are
aware of the *rule* by which the population generates random samples. 
We call this rule the **probability distribution**. 

With respect to :numref:`two-arrows-diagram`, for example, suppose we know

- the set of possible outcomes (cat, dog, and ghost),
- the count of each category in the population, and
- that each individual has an equal chance of being selected into a sample and 
  are independent of the anothers.

Then we can calculate the probability of observing any specific outcome in a sample
from this population. Examples of questions we can answer include:

1. Suppose a single individual is randomly sampled. What is the probability that 
   this is a cat?
2. Suppose a random sample of size 10 is taken. What is the proability that this sample
   consists of nine dogs and a cat?
3. ...

In the case of :numref:`two-arrows-diagram`, the probability distribution is described through
a combination of graphics and words. In other cases, a probability distribution can be expressed 
as a table of possible values with their associated probabilities or, in more complex scenarios,
as a function. 

By the end of Chapter 6, we will have built enough probabilistic language to address these different types.

.. admonition:: Example 💡: Population → Sample
  :class: note

  .. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter1/passport.jpg
    :align: right
    :alt: passport
    :figwidth: 20%

  Suppose we know that 36% of all Americans have a passport. 
  
  **Q1**. What is the probability that a randomly sampled American does not hold a passport?
          
          Probabilities are always expressed in a 0-1 scale. The probability would be 
          :math:`1-0.36 = 0.64`.

  **Q2**. If we take a random sample of 20 Americans, what's the probability that exactly 10 
  of them will have a passport?

          Under the assumption that all Americans have an equal chance of being sampled and that
          their inclusion in the sample does not impact the status of others, the probability is

            :math:`{{20}\choose{10}} (0.36)^{10}(0.64)^{10} = 0.0779`

  ‼️ At this stage, you do not need to fully follow how these numbers are computed. The takeaway from
  this example should be **the understanding that knowing the probability distribution allows us 
  to compute the likelihood of various events** that can happen in a sample. If you want to understand 
  the computational steps, revisit after Chapter 5.

Statistical Inference: Population ← Sample
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Statistical inference begins with full access to a dataset -- a sinlge random sample from the population -- 
but limited knowledge of the population's probability distribution. It aims to
understand the population through the sample while accounting for its limitations through quantifying
the uncertainty.

In :numref:`two-arrows-diagram`, suppose we can only see the sample. From this, we would `guess`

- that the population contains cats, dogs, and ghosts,
- that there are fewer ghosts than cats or dogs, and
- that cats and dogs take up similar proportions

However, we would not be able to rule out the possibility that the population contains a wider range of categories,
and we would not be strongly convinced about the true proportions of the observed categories considering how small
the sample is.

.. admonition:: Example 💡: Population ← Sample
  :class: note
  
  .. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter1/passport.jpg
    :align: right
    :alt: passport
    :figwidth: 20%

  Suppose we interview 20 Americans at random and find that 8 of them (40%) hold a passport. 
  
  **Q1**. What can we conclude about the percentage of all Americans who hold a passport?

    If we were to make a **point estimation**, the best we could say is that **around 40% of all Americans
    are estimated to hold a passport**. In most cases, we would continue by numerically expresses the degree of 
    uncertainty in this quantity through further inference methods.

  We are not yet ready to formally perform uncertainty quantification, but 
  we can begin to think about **how different components might affect our confidence**.

  **Q2**. If all participants were interviewed at the entrance of Indianapolis International Airport, 
  does the representativeness of the sample for *all Americans* increase or decrease? 
  How does this impact the credibility of the 40% estimate?

    It is reasonable to believe that individuals with passports are over-represented at this location.
    This would reduce the credibility that of the 40% estimate as representative of *all Americans*.

  **Q3**. If a sample was larger than 20, while the percentage remained 40%,
  would our uncertainty increase or decrease?
    
    By taking a larger sample, the sample becomes more representative of the population (imagine taking
    a "sample" as large as the entire population of the United States!). As a result, the uncertainty 
    in the estimate would decrease.

  **Q4**. Can you think of any other aspects of the problem that might affect the degree of uncertainty?
    
    This quetsion is left for you to explore on your own.

Looking Ahead
------------------------------------------------------------------

Throughout this course, you are encouraged to come back to these two fundamental directions of reasoning. 
Understanding when you're doing probability calculations versus statistical inference will 
clarify the methods and implications of each scenario.

.. admonition:: Quick Check ✔
  :class: important
  
  1. When would a *sample* ever be preferable to a *census* of the entire population?
  2. Can you describe the different between *probability* and *statistics* in a few sentences?
  3. Why can two different random samples yield different conclusions even when drawn from the same population?