.. _worksheet7:

Worksheet 7: Continuous Random Variables
========================================

.. admonition:: Learning Objectives 🎯
   :class: info

   • Understand the fundamental differences between discrete and continuous random variables
   • Validate probability density functions using the two key properties
   • Calculate probabilities using integration of PDFs
   • Find constants to make functions valid PDFs
   • Compute expected values and variances for continuous distributions
   • Work with cumulative distribution functions (CDFs)

Introduction
------------

Throughout our exploration of probability, we have primarily focused on discrete random variables whose possible outcomes can be listed and whose probabilities are assigned to those distinct values. However, many real-world phenomena such as measurement data, time intervals, and physical quantities are more naturally modeled by **continuous random variables**. Unlike their discrete counterparts, continuous random variables can take on any value within an interval, and probabilities are determined via integration of a probability density function (PDF) rather than by summation of a probability mass function (PMF). 

In this worksheet, we will introduce the core concepts of continuous random variables in their general form without restricting ourselves to well-known named distributions. We will examine how to define PDFs such that they are valid and explore the utility of cumulative distribution functions (CDFs), and how to compute probabilities and expectations.

Part 1: Probability Density Functions
-------------------------------------

A **continuous random variable** is a random variable that can take on an infinite number of possible values within a given range. Unlike discrete random variables, which have a countable set of possible outcomes, continuous random variables are associated with probability distributions where individual points have zero probability. Instead, probabilities are determined by the probability density function (PDF), and the probability of an event occurring within a given interval is found by integrating the PDF over that interval. The total area under the PDF curve must equal one, ensuring that the variable adheres to probability rules.

A probability distribution for a continuous random variable :math:`X` is given by a smooth curve called a density curve, or **probability density function (pdf)** and is defined as:

.. math::
   f_X(x) = \lim_{\Delta \to 0^+} \frac{P(x < X \leq x + \Delta)}{\Delta}

The **support** of a continuous random variable (r.v.) :math:`X`, is the set of all possible values for which the probability density function is strictly positive:

.. math::
   \text{Supp}(X) = \{x \in \mathbb{R} | f_X(x) > 0\}

This definition formalizes the idea that the PDF describes how probability is distributed around a particular point, rather than assigning probability to any single outcome. It measures how densely probability accumulates near individual values of :math:`x`, much like a derivative in calculus captures the rate of change of a function. However, because a continuous random variable does not assign positive probability to individual points i.e., :math:`P(X = x) = 0`, we compute actual probabilities by integrating over an interval:

.. math::
   P(a < X < b) = \int_a^b f_X(x) dx

**Properties of Valid PDFs**

A probability density function (PDF) must satisfy the following conditions to be considered valid:

1. **Non-Negativity:** The function must always be non-negative for all possible values of :math:`x`, since probabilities cannot be negative.
   
   .. math::
      f_X(x) \geq 0, \forall x \in \mathbb{R}

2. **Total Probability Equals One:** The total area under the probability density curve must be exactly 1, ensuring that the function represents a proper probability distribution.
   
   .. math::
      \int_{-\infty}^{+\infty} f_X(x) dx = 1

Both conditions must be checked before declaring a function to be a valid probability density function. A reasonable approach to checking the first condition (non-negativity) is to analyze the function algebraically by determining its roots and ensuring that :math:`f_X(x) > 0` within its support, since it is defined to be zero elsewhere. Alternatively, the recommended approach in this course is to use graphical verification—provide a rough graph of the function and confirm that it does not dip below zero at any point within the support.

.. note::
   **Review Calculus:** Review basic integration rules with specific focus on:
   
   • Rules for linearity, the power rule, and definite vs. indefinite integrals
   • The Fundamental Theorem of Calculus
   • U-substitution and integration by parts
   • Splitting integrals over multiple subintervals for piecewise functions
   • Integration of common functions: constants, polynomials, exponentials
   
   We will NOT use partial fraction decomposition or trigonometric substitution.

**Question 1:** For the following functions determine if it is a valid probability density function.

a) Is the following function :math:`f_X(x)` a valid pdf?

   .. math::
      f_X(x) = \begin{cases}
      x, & 0 < x < 1 \\
      2 - x, & 1 \leq x < 2 \\
      0, & \text{otherwise}
      \end{cases}

b) Is the following function :math:`f_Y(y)` a valid pdf?

   .. math::
      f_Y(y) = \begin{cases}
      \frac{5}{y^2}, & y \geq 5 \\
      0, & \text{otherwise}
      \end{cases}

c) Is the following function :math:`f_Z(z)` a valid pdf?

   .. math::
      f_Z(z) = \begin{cases}
      \frac{1}{88}(3z^2 - 9), & 1 \leq z \leq 5 \\
      0, & \text{otherwise}
      \end{cases}

d) Is the following function :math:`f_V(v)` a valid pdf? For this problem first try to rewrite the function in piecewise form.

   .. math::
      f_V(v) = \frac{\lambda}{2} e^{-\lambda|x|}, \quad \forall x \in \mathbb{R} \text{ and } \lambda > 0

**R Code for Visualization:**

.. code-block:: r

   # Function to check if a PDF is valid
   check_pdf <- function(f, lower, upper, name) {
     # Check non-negativity by plotting
     x <- seq(lower, upper, length.out = 1000)
     y <- sapply(x, f)
     
     plot(x, y, type = "l", main = paste("PDF:", name),
          xlab = "x", ylab = "f(x)", lwd = 2)
     abline(h = 0, col = "red", lty = 2)
     
     # Check if integrates to 1
     integral <- integrate(f, lower, upper)$value
     cat(name, "integrates to:", integral, "\n")
     
     # Check non-negativity
     if (all(y >= -1e-10)) {  # Allow for numerical error
       cat(name, "is non-negative\n")
     } else {
       cat(name, "has negative values!\n")
     }
     
     return(integral)
   }
   
   # Example for part (a)
   f_a <- function(x) {
     if (x > 0 && x < 1) return(x)
     else if (x >= 1 && x < 2) return(2 - x)
     else return(0)
   }
   f_a_vec <- Vectorize(f_a)
   
   check_pdf(f_a_vec, -0.5, 2.5, "Part (a)")

Part 2: Finding Constants for Valid PDFs
-----------------------------------------

**Question 2:** For the following functions determine the constant that would make them a valid probability density function.

a) Determine the constant :math:`c` that would make :math:`f_X(x)` a valid probability density function.

   .. math::
      f_X(x) = \begin{cases}
      \frac{1}{5}, & -1 \leq x \leq 0 \\
      \frac{1}{5} + cx, & 0 < x \leq 1 \\
      0, & \text{otherwise}
      \end{cases}

b) Determine the constant :math:`k` that would make :math:`f_Y(y)` a valid probability density function.

   .. math::
      f_Y(y) = \begin{cases}
      k(y - 6), & 6 \leq y \leq 8 \\
      2y, & 8 < y \leq 12 \\
      k(14 - y), & 12 < y \leq 14 \\
      0, & \text{otherwise}
      \end{cases}

Part 3: Expected Value and Variance
------------------------------------

The **expected value** of a continuous random variable is the continuously weighted average of all values within the support of the random variable.

.. math::
   \mu_X = E[X] = \int_{-\infty}^{+\infty} x f_X(x) dx

The rules we learned about in discrete cases, i.e., linearity of expectation, additivity, and LOTUS also apply in the continuous case. We also have an analogous version of variance in the continuous case:

.. math::
   \sigma_X^2 = \text{Var}(X) = E[(X - \mu_X)^2] = \int_{-\infty}^{+\infty} (x - \mu_X)^2 f_X(x) dx

Which simplifies to:

.. math::
   \text{Var}(X) = E[X^2] - (E[X])^2

Similarly the rules for variance hold in the continuous case, i.e., shifting by a constant does not change variance and if a random variable is scaled by a constant, its variance is scaled by the constant squared: :math:`\text{Var}(aX + b) = a^2\text{Var}(X)`, and the additivity of variance holds when we have independent random variables: :math:`\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)`.

**Question 3:** Determine the expected value and standard deviation for the random variable :math:`X` with pdf given below.

.. math::
   f_X(x) = \begin{cases}
   1/20, & -10 < x \leq -5 \\
   1/4, & -1 < x \leq 1 \\
   1/20, & 5 < x \leq 10 \\
   0, & \text{otherwise}
   \end{cases}

Part 4: Cumulative Distribution Functions
-----------------------------------------

The **Cumulative Distribution Function (CDF)** provides the probability that a random variable :math:`X` is less than or equal to a given value :math:`x`, making it a function of :math:`x`. As the name suggests, the CDF accumulates the probability from the start of the support up to the value :math:`x` rather than simply being an antiderivative of the probability density function (PDF). This distinction is especially important in piecewise PDFs, where we must ensure that the CDF correctly accumulates probability across different regions, maintaining continuity and ensuring that it reaches 1 at the upper bound of the distribution. 

We use :math:`F_X(x)` to denote the CDF for :math:`X` and it must satisfy the following properties:

- The CDF is a **non-decreasing function:** For any :math:`a, b \in \mathbb{R}`, if :math:`a < b`, then :math:`F_X(a) \leq F_X(b)`
- **Limiting behavior:** :math:`\lim_{x \to -\infty} F_X(x) = 0` and :math:`\lim_{x \to +\infty} F_X(x) = 1`
- **Right Continuous Convention:** :math:`\lim_{x \to c^+} F_X(x) = F_X(c)`

**Question 4:** Answer the following questions for the given probability density function :math:`f_X(x)` given below.

.. math::
   f_X(x) = \begin{cases}
   1 - e^{-\frac{1}{4}}, & 0 \leq x \leq 1 \\
   \frac{1}{16} e^{-\frac{x}{16}}, & x \geq 4 \\
   0, & \text{otherwise}
   \end{cases}

a) How many regions would we need to be concerned with for the cumulative distribution function?

b) Find the cumulative distribution function for the random variable :math:`X` defined by the PDF :math:`f_X(x)`.

c) Use the CDF to determine the probability that a trial from this random variable would be less than ½.

d) Use the CDF to evaluate the following probability :math:`P(X > 5)`.

e) Use the CDF to evaluate the following probability :math:`P(X < 5|X > 1)`.

**R Code for CDF Calculations:**

.. code-block:: r

   # Example: Building a CDF from a piecewise PDF
   # For Question 4
   
   # Define the PDF
   pdf_q4 <- function(x) {
     if (x >= 0 && x <= 1) {
       return(1 - exp(-1/4))
     } else if (x >= 4) {
       return((1/16) * exp(-x/16))
     } else {
       return(0)
     }
   }
   pdf_q4_vec <- Vectorize(pdf_q4)
   
   # Build the CDF
   cdf_q4 <- function(x) {
     if (x < 0) {
       return(0)
     } else if (x >= 0 && x <= 1) {
       return(integrate(pdf_q4_vec, 0, x)$value)
     } else if (x > 1 && x < 4) {
       return(integrate(pdf_q4_vec, 0, 1)$value)
     } else {  # x >= 4
       return(integrate(pdf_q4_vec, 0, 1)$value + 
              integrate(pdf_q4_vec, 4, x)$value)
     }
   }
   cdf_q4_vec <- Vectorize(cdf_q4)
   
   # Plot the CDF
   x_vals <- c(seq(-1, 1, 0.01), seq(1.01, 3.99, 0.1), seq(4, 10, 0.1))
   y_vals <- cdf_q4_vec(x_vals)
   
   plot(x_vals, y_vals, type = "l", main = "CDF for Question 4",
        xlab = "x", ylab = "F(x)", lwd = 2, ylim = c(0, 1))
   abline(h = c(0, 1), col = "gray", lty = 2)

Key Takeaways
-------------

.. admonition:: Summary 📝
   :class: important
   
   • **Continuous random variables** can take any value in an interval; probabilities are areas under the PDF
   • **Valid PDFs** must be non-negative everywhere and integrate to 1
   • **Individual points** have probability zero: :math:`P(X = x) = 0`
   • **Expected value** and **variance** use integration instead of summation
   • **CDFs** accumulate probability from :math:`-\infty` to :math:`x` and are always non-decreasing
   • For **piecewise PDFs**, carefully handle boundaries when integrating