Cumulative Distribution Functions
==============================================
Computing probabilities for continuous random variables requires integrating the PDF over intervals,
which can become tedious when we need to calculate many different probabilities. Just as calculus
provides us with antiderivatives to avoid repeated integration, probability theory offers us the
cumulative distribution function (CDF) as a tool that eliminates the need for repeated
integrations.
.. admonition:: Road Map 🧭
:class: important
• Define the **cumulative distribution function (CDF)**.
• Master the **relationship between PDF and CDF** through the fundamental theorem of calculus.
• Learn essential **properties of CDFs** that make them valid probability functions.
• Practice **computing probabilities** using CDFs instead of direct integration.
• Apply CDFs to find **percentiles and quantiles** for continuous distributions.
The Universal Language of Cumulative Probability
-------------------------------------------------
The **cumulative distribution function** (CDF) provides a unified approach to probability that
works seamlessly for both discrete and continuous random variables. Instead of dealing
with probability mass functions and probability density functions separately, the CDF
gives us one consistent framework.
Definition
~~~~~~~~~~~~
For any random variable :math:`X` (discrete or continuous), the cumulative distribution
function :math:`F_X(x)` is defined as:
.. math::
F_X(x) = P(X \leq x)
The definition expresses a simple idea: :math:`F_X(x)` gives the probability
that the random variable :math:`X` falls at or below any
given value :math:`x`.
.. flat-table::
:header-rows: 2
:align: center
:width: 100%
* - :cspan:`1` Implementation of CDF to Different Variable Types
* - Discrete
- Continuous
* - .. math:: F_X(x) = P(X \leq x) = \sum_{t=-\infty}^{x} p_X(t)
- .. math:: F_X(x) = P(X \leq x) = \int_{-\infty}^{x} f_X(t) \, dt
In both cases, the argument of the PMF/PDF is replaced with a dummy variable :math:`t`.
This is to avoid any confusion with teh true argument of :math:`F_X`.
In the continuous case, :math:`F_X(x)` can be used for :math:`P(X\leq x)` or
:math:`P(X 6`): All probability has been accumulated
.. math::
F_X(x) &= \int_{-\infty}^x f_X(t) \, dt \\
&= F_X(6) + \int_{6}^{\infty} f_X(t) \, dt = 1 + \int_6^{\infty} 0 \, dt = 1.
Putting together,
.. math::
F_X(x) =
\begin{cases}
0 & \text{for } x < 0 \\
\frac{3x^2}{4} & \text{for } 0 \leq x \leq 1 \\
\frac{3}{4} & \text{for } 1 < x < 5 \\
\frac{x-2}{4} & \text{for } 5 \leq x \leq 6 \\
1 & \text{for } x > 6
\end{cases}
* The video below goes through the series of examples covered in this section.
.. raw:: html
.. admonition:: 🛑 Know how to avoid the common mistake before simplifying steps
:class: danger
In all regions beyond the first in the previous example,
the CDF :math:`f_X(x)` is computed as the sum of area accumulated in the earlier regions **plus**
the integral over the current region.
Forgetting to factor in the area from previous regions is a very common mistake made by students.
While you may eventually skip some redundant steps as you gain familiarity,
your top priority should be to avoid this error.
Computing Probabilities with CDFs
------------------------------------
Once we have the CDF, calculating probabilities becomes remarkably straightforward.
We can handle any probability question without additional integration.
.. admonition:: Recall that equality signs do not matter for continuous RVs
:class: important
All the rules below will apply if :math:`<` and :math:`\leq` are interchanged
as well as if :math:`>` and :math:`\geq` are switched
because :math:`P(X = a) = 0` for any single point :math:`a`.
Basic CDF Evaluation
~~~~~~~~~~~~~~~~~~~~~~~~
For :math:`P(X \leq a)`,
.. math::
P(X \leq a) = F_X(a)
This is just the definition of the CDF—plug in the value and read the result.
"Greater Than" Probabilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For :math:`P(X > a)`, we use the complement rule:
.. math::
P(X > a) = 1 - P(X \leq a) = 1 - F_X(a)
The probability of exceeding :math:`a` equals
one minus the probability of not exceeding :math:`a`.
Interval Probabilities
~~~~~~~~~~~~~~~~~~~~~~~~
For :math:`P(a < X \leq b)`, we take the
accumulated probability up to :math:`b` and **subtract** the accumulated probability
up to :math:`a`, leaving just the probability in the interval.
.. math::
P(a < X \leq b) = F_X(b) - F_X(a)
.. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter6/interval-prob-cdf.png
:alt: Computing interval probabilities with CDFs
:align: center
:figwidth: 80%
*P(a < X ≤ b) equals the difference between CDF values*
.. admonition:: Example💡: Computing Probabilities Using a CDF
:class: note
Use the PDF and CDF found in the previous example to answer the questions below.
.. math::
f_X(x) = \begin{cases}
\frac{3x}{2} & \text{for } 0 \leq x \leq 1 \\
\frac{1}{4} & \text{for } 5 \leq x \leq 6 \\
0 & \text{elsewhere}
\end{cases}
.. math::
F_X(x) = \begin{cases}
0 & \text{for } x < 0 \\
\frac{3x^2}{4} & \text{for } 0 \leq x < 1 \\
\frac{3}{4} & \text{for } 1 \leq x < 5 \\
\frac{x-2}{4} & \text{for } 5 \leq x \leq 6 \\
1 & \text{for } x > 6
\end{cases}
1. Find :math:`P(X \leq 0.5)`.
:math:`x=0.5` belongs to Region 2 of the CDF, so :math:`F_X(0.5) = \frac{3(0.5)^2}{4} = \frac{3}{16}`.
2. Find :math:`P(X \geq 5.5)`.
:math:`P(X \geq 5.5) = 1 - P(X < 5.5) = 1 - F_X(5.5)`. :math:`x=5.5` is in Region 4.
Therefore, :math:`P(X > 5.5) = 1 - \frac{5.5-2}{4} = 1 - \frac{7}{8} = \frac{1}{8}`
3. Find :math:`P(0.2 < X \leq 5.8)`.
.. math::
P(0.2 < X \leq 5.8) &= F_X(5.8) - F_X(0.2)
= \frac{5.8-2}{4} -\frac{3(0.2)^2}{4}\\ &= 0.95 - 0.03 = 0.92.
Finding Percentiles using CDF
-----------------------------------
Definition
~~~~~~~~~~~~~
Take a value :math:`p \in [0,1]`. The :math:`p \times 100` th percentile of a
random variable :math:`X`, denoted :math:`x_p`, is the value satisfying:
.. math::
F_X(x_p) = P(X \leq x_p) = p.
For example,
* The **50th percentile** of a random variable :math:`X` is written as :math:`x_{0.5}`
and satisfies the conditon:
.. math:: F_X(x_{0.5}) = P(X \leq x_{0.5})= 0.5.
:math:`x_{0.5}` represents the vertical cutoff on the PDF of :math:`X`
such that the area to its left is exactly :math:`0.5`.
This special percentile is also called the **median** (:math:`\tilde{\mu}`) of :math:`X`.
.. figure:: https://yjjpfnblgtrogqvcjaon.supabase.co/storage/v1/object/public/stat-350-assets/images/chapter6/50th-percentile.png
:alt: 50th percentile
:align: center
:figwidth: 60%
* The **25th percentile** :math:`x_{0.25}` satisfies :math:`F_X(x_{0.25}) = 0.25`.
It splits the PDF into a left region with area 0.25 and
right region with area 0.75.
* The **75th percentile** :math:`x_{0.75}` satisfies :math:`F_X(x_{0.75}) = 0.75`.
It splits the PDF into a left region with area 0.75 and
right region with area 0.25.
* The *true* IQR of :math:`X` is the difference of :math:`x_{0.75} - x_{0.25}`.
.. admonition:: Connection to *sample* percentiles
:class: important
Recall that we've already discussed the concept of *sample* percentiles for
a *dataset* in Chapter 3. The percentiles for a *random variable* are considered
their *population* equivalent in the framework of data analysis.
If we generate data points :math:`x_1, x_2,\cdots, x_n` from
a random variable :math:`X`, we would compute the *sample* percentiles using
:math:`x_1, x_2,\cdots, x_n`, and the *true* percentiles using the CDF
of :math:`X`. The two sets of objects are closely related, but different.
Computing Percentiles of a Continuous Random Variable
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Percentiles for a continuous random variables are found by **replacing the
general definition** :math:`F_X(x_p) = p` **with specific expressions, than solving for** :math:`x_p`.
See the example below for further detail.
.. admonition:: Example💡: Finding a Percentile
:class: note
Find the 85th percentile of a random variable :math:`X` which has the
following CDF:
.. math::
F_X(x) = \begin{cases}
0 & \text{for } x < 0 \\
\frac{3x^2}{4} & \text{for } 0 \leq x < 1 \\
\frac{3}{4} & \text{for } 1 \leq x < 5 \\
\frac{x-2}{4} & \text{for } 5 \leq x \leq 6 \\
1 & \text{for } x > 6
\end{cases}
We must solve :math:`F_X(x_{0.85}) = 0.85` for :math:`x_{0.85}`.
Which piece of the CDF should we use to replace the general expression :math:`F_x(x_{0.85})`?
Since :math:`F_X(5) = \frac{3}{4} = 0.75 < 0.85`, we know that the 85th percentile has to be
strictly greater than :math:`5`. Using Region 4,
.. math::
&\frac{x_{0.85} - 2}{4} = 0.85\\
&x_{0.85} - 2 = 3.4\\
&x_{0.85} = 5.4
More examples
-----------------------------------------------------------
.. raw:: html
Bringing It All Together
---------------------------
.. admonition:: Key Takeaways 📝
:class: important
1. The **cumulative distribution function** :math:`F_X(x) = P(X \leq x)` provides a universal framework for
both discrete and continuous random variables. It **eliminates the need for repeated integration**
when computing probabilities for continuous random variables.
2. For continuous variables, **PDFs and CDFs are related** through differentiation and integration:
:math:`f_X(x) = F_X'(x)` and :math:`F_X(x) = \int_{-\infty}^x f_X(t) \, dt`.
3. **Valid CDFs** must be non-decreasing, approach 0 as :math:`x \to -\infty`, approach 1 as :math:`x \to +\infty`,
and be right-continuous.
4. **Probability calculations** become simpler with CDFS: :math:`P(X \leq a) = F_X(a)`,
:math:`P(X > a) = 1 - F_X(a)`, and :math:`P(a < X \leq b) = F_X(b) - F_X(a)`.
5. **Piecewise CDFs** require careful attention—each region must
include all probability accumulated from previous regions.
6. **Percentiles** are found by solving :math:`F_X(x_p) = p`.
In our next sections, we'll see how these CDF concepts apply to specific named distributions,
starting with the most important continuous distribution in all of statistics: the normal distribution.
Exercises
~~~~~~~~~~~~~
1. **CDF Construction**: Given the PDF :math:`f_X(x) = 2x` for :math:`0 \leq x \leq 1`, 0 elsewhere:
a) Find the CDF :math:`F_X(x)`
b) Verify that your CDF satisfies all required properties
c) Use the CDF to find :math:`P(0.2 < X \leq 0.8)`
2. **Piecewise CDF**: Consider the PDF:
.. math::
f_X(x) = \begin{cases}
kx & \text{for } 0 \leq x \leq 2 \\
k & \text{for } 2 < x \leq 4 \\
0 & \text{elsewhere}
\end{cases}
a) Find the value of :math:`k` that makes this a valid PDF.
b) Construct the complete CDF.
c) Find :math:`P(1 < X < 3)` using the CDF.
3. **CDF Validation**: Determine whether the following function could be a valid CDF:
.. math::
F(x) = \begin{cases}
0 & \text{for } x < 0 \\
x^2 & \text{for } 0 \leq x < 1 \\
1 & \text{for } x \geq 1
\end{cases}
If valid, find the corresponding PDF.
4. **Percentile Calculations**: For the CDF from Exercise 2,
a) Find the median.
b) Find the 75th percentile.
c) Find the interquartile range.
5. **PDF-CDF Relationship**: Given :math:`F_X(x) = 1 - e^{-x}` for :math:`x \geq 0`, 0 elsewhere,
a) Find the PDF :math:`f_X(x)`.
b) Calculate :math:`P(X > 2)`.
c) Find the value :math:`x` such that :math:`P(X \leq x) = 0.95`.