.. _hw1-distributional-relationships:

====================================================================
Homework 1: Distributional Relationships and Computational Foundations
====================================================================

:Due: [TBD]
:Submission: Upload your written derivations (PDF) and Python verification script (.py or .ipynb) to Gradescope

.. contents:: Assignment Contents
   :local:
   :depth: 2

Overview and Learning Objectives
================================

Probability distributions don't exist in isolation—they form an interconnected family linked through sums, limits, transformations, and special cases. Understanding these relationships is essential for:

* **Simulation**: Generating samples from complex distributions using simpler ones
* **Inference**: Deriving sampling distributions of test statistics  
* **Model selection**: Recognizing when a simpler distribution is appropriate
* **Derivation**: Proving properties of one distribution from another

This assignment develops your ability to **prove** distributional relationships rigorously and **verify** your work computationally. You'll use moment generating functions (MGFs) where they exist and transformation techniques where they don't—learning when each tool is appropriate.

.. admonition:: Learning Outcomes Addressed
   :class: important

   * **LO 1**: Apply simulation techniques, including transformation methods for random variable generation
   * **LO 2**: Compare mathematical frameworks for probabilistic reasoning and inference
   * **LO 3**: Implement computational methods for verifying theoretical results

Assignment Structure
--------------------

The assignment has three parts:

* **Part I: MGF and Analytic Derivations** (Problems 1–6): Use MGFs where they exist, plus related analytic techniques (sums, transforms, mixtures) to prove distributional identities
* **Part II: Transformation Methods** (Problems 7–9): Use Jacobian transformations for distributions lacking MGFs (Cauchy, t, F)
* **Part III: Computational Foundations** (Problems 10–11): Empirically verify the Central Limit Theorem and demonstrate proper use of random number generation

Each problem in Parts I and II requires **two components**:

1. **Mathematical Derivation** (handwritten or typeset): Show all steps, define notation, state assumptions, and justify each equality. This is the primary assessment—your derivation must be complete and correct.

2. **Computational Verification** (Python): Use the visualization utilities provided to verify your algebra and explore the result numerically. This catches errors and builds intuition.

Part III problems are primarily computational—you'll implement simulations and create visualizations that demonstrate key theoretical concepts from Chapter 1.

The computational component is not a shortcut—it verifies work you've already done by hand. Start each problem with pencil and paper before touching the keyboard.


Mathematical Preliminaries
==========================

This section provides reference material. You may cite these results without re-deriving them.

Moment Generating Functions
---------------------------

For a random variable :math:`X`, the **moment generating function** (MGF) is defined as:

.. math::

   M_X(t) = \mathbb{E}[e^{tX}]

for all :math:`t` in an open interval containing zero where the expectation exists.

**Key Property (Independent Sums)**: If :math:`X_1, \ldots, X_n` are independent and each :math:`M_{X_i}(t)` exists in a neighborhood of zero, then:

.. math::

   M_{\sum_{i=1}^n X_i}(t) = \prod_{i=1}^n M_{X_i}(t)

**Uniqueness**: Under standard regularity conditions, the MGF uniquely determines the distribution. If two random variables have the same MGF in a neighborhood of zero, they have the same distribution.

Reference: Common MGFs
~~~~~~~~~~~~~~~~~~~~~~

The following MGFs may be used without derivation:

.. list-table:: Common Moment Generating Functions
   :header-rows: 1
   :widths: 25 35 40

   * - Distribution
     - PMF/PDF
     - MGF :math:`M_X(t)`
   * - Geometric(:math:`p`)
     - :math:`P(X=k) = p(1-p)^{k-1}`, :math:`k=1,2,\ldots`
     - :math:`\dfrac{pe^t}{1-(1-p)e^t}`, :math:`t < -\ln(1-p)`
   * - NegBinom(:math:`r, p`)
     - :math:`P(Y=k) = \binom{k-1}{r-1}p^r(1-p)^{k-r}`, :math:`k \geq r`
     - :math:`\left(\dfrac{pe^t}{1-(1-p)e^t}\right)^r`
   * - Poisson(:math:`\lambda`)
     - :math:`P(X=k) = e^{-\lambda}\dfrac{\lambda^k}{k!}`, :math:`k=0,1,\ldots`
     - :math:`\exp(\lambda(e^t - 1))`
   * - Binomial(:math:`n,p`)
     - :math:`P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}`, :math:`k=0,\ldots,n`
     - :math:`(1 - p + pe^t)^n`
   * - Normal(:math:`\mu, \sigma^2`)
     - :math:`f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)`
     - :math:`\exp\left(\mu t + \frac{\sigma^2 t^2}{2}\right)`
   * - :math:`\chi^2(\nu)`
     - :math:`f(x) = \frac{x^{\nu/2-1}e^{-x/2}}{2^{\nu/2}\Gamma(\nu/2)}`, :math:`x > 0`
     - :math:`(1 - 2t)^{-\nu/2}`, :math:`t < 1/2`

**Important**: The Cauchy, Student's :math:`t`, and :math:`F` distributions do **not** have MGFs (the expectation diverges). For these, use transformation/Jacobian methods.

.. list-table:: Distributions Without MGFs (for Part II reference)
   :header-rows: 1
   :widths: 25 50 25

   * - Distribution
     - PDF
     - Support
   * - Cauchy (standard)
     - :math:`f(x) = \dfrac{1}{\pi(1 + x^2)}`
     - :math:`x \in \mathbb{R}`
   * - Student's :math:`t(\nu)`
     - :math:`f(x) = \dfrac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\,\Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{x^2}{\nu}\right)^{-(\nu+1)/2}`
     - :math:`x \in \mathbb{R}`
   * - :math:`F(\nu_1, \nu_2)`
     - :math:`f(x) = \dfrac{1}{B(\nu_1/2, \nu_2/2)} \left(\frac{\nu_1}{\nu_2}\right)^{\nu_1/2} x^{\nu_1/2-1} \left(1 + \frac{\nu_1 x}{\nu_2}\right)^{-(\nu_1+\nu_2)/2}`
     - :math:`x > 0`

These distributions arise from ratios and other transformations of normal and chi-square random variables, which is why transformation methods are required.


When MGFs Fail: Transformation Methods
--------------------------------------

When an MGF doesn't exist, we derive distributions using the **transformation (Jacobian) method**:

For :math:`Y = g(X)` where :math:`g` is monotonic with inverse :math:`g^{-1}`:

.. math::

   f_Y(y) = f_X(g^{-1}(y)) \cdot \left| \frac{d}{dy} g^{-1}(y) \right|

For multivariate transformations :math:`(Y_1, Y_2) = g(X_1, X_2)`:

.. math::

   f_{Y_1, Y_2}(y_1, y_2) = f_{X_1, X_2}(g^{-1}(y_1, y_2)) \cdot |J|

where :math:`|J|` is the absolute value of the Jacobian determinant.


Computational Setup
===================

Before starting, set up your Python environment. The code below provides tools for **verifying** your hand derivations—SymPy can check symbolic algebra, and numerical sampling confirms distributional results.

.. code-block:: python

   """
   Homework 1: Distributional Relationships
   Computational Verification Template
   
   Instructions:
   1. Complete the hand derivation FIRST (on paper or iPad)
   2. Use this code to verify your algebraic work
   3. Include output and figures in your submission
   """
   
   import numpy as np
   import sympy as sp
   from sympy import (symbols, exp, log, sqrt, pi, simplify, expand, 
                      integrate, diff, factorial, binomial, gamma, oo,
                      Rational, latex, init_printing)
   from sympy.stats import Normal, ChiSquared, density, E
   from scipy import stats
   import matplotlib.pyplot as plt
   
   init_printing()  # Pretty symbolic output
   
   # Common symbols (define once, reuse throughout)
   t, p, r, k, n, m = symbols('t p r k n m', positive=True, real=True)
   lam, lam1, lam2 = symbols('lambda lambda_1 lambda_2', positive=True)
   mu, sigma = symbols('mu sigma', real=True)
   nu, nu1, nu2 = symbols('nu nu_1 nu_2', positive=True, integer=True)
   z, u, x = symbols('z u x', real=True)


Visual Verification Tools
=========================

A key component of this assignment is **visually verifying** your theoretical results. For each distributional relationship, you will:

1. **Sample directly** from the target distribution (e.g., Negative Binomial)
2. **Construct samples** by sampling from the building-block distributions and applying the appropriate transformation (e.g., sum of Geometrics)
3. **Compare visually** using histograms, kernel density estimates, and Q-Q plots

If your derivation is correct, the two sampling methods should produce indistinguishable distributions.

Visualization Utilities
-----------------------

Use the following helper functions throughout the assignment:

.. code-block:: python

   import numpy as np
   from scipy import stats
   import matplotlib.pyplot as plt
   
   def compare_distributions(samples_constructed, samples_direct, 
                             title, label_constructed, label_direct,
                             discrete=False, theoretical_pmf=None, 
                             theoretical_pdf=None, x_range=None):
       """
       Visually compare two sets of samples that should follow the same distribution.
       
       Parameters
       ----------
       samples_constructed : array-like
           Samples obtained by transforming/combining simpler distributions
       samples_direct : array-like
           Samples drawn directly from the target distribution
       title : str
           Plot title
       label_constructed : str
           Label for constructed samples (e.g., "Sum of Geometrics")
       label_direct : str
           Label for direct samples (e.g., "NegBinom(r,p)")
       discrete : bool
           If True, use bar plots for PMF comparison
       theoretical_pmf : callable, optional
           Function computing theoretical PMF at integer values
       theoretical_pdf : callable, optional
           Function computing theoretical PDF
       x_range : tuple, optional
           (min, max) for x-axis; auto-determined if None
       """
       fig, axes = plt.subplots(1, 3, figsize=(14, 4))
       fig.suptitle(title, fontsize=14, fontweight='bold')
       
       # Panel 1: Histogram/PMF comparison
       ax1 = axes[0]
       if discrete:
           # For discrete distributions, use aligned bar plots
           all_vals = np.concatenate([samples_constructed, samples_direct])
           bins = np.arange(all_vals.min() - 0.5, all_vals.max() + 1.5, 1)
           
           ax1.hist(samples_constructed, bins=bins, density=True, alpha=0.5, 
                    label=label_constructed, color='steelblue', edgecolor='white')
           ax1.hist(samples_direct, bins=bins, density=True, alpha=0.5,
                    label=label_direct, color='coral', edgecolor='white')
           
           if theoretical_pmf is not None:
               x_vals = np.arange(int(all_vals.min()), int(all_vals.max()) + 1)
               ax1.plot(x_vals, [theoretical_pmf(k) for k in x_vals], 'k-', 
                        lw=2, label='Theoretical PMF')
       else:
           # For continuous distributions, use KDE
           ax1.hist(samples_constructed, bins=50, density=True, alpha=0.5,
                    label=label_constructed, color='steelblue', edgecolor='white')
           ax1.hist(samples_direct, bins=50, density=True, alpha=0.5,
                    label=label_direct, color='coral', edgecolor='white')
           
           if theoretical_pdf is not None:
               if x_range is None:
                   x_range = (min(samples_constructed.min(), samples_direct.min()),
                              max(samples_constructed.max(), samples_direct.max()))
               x_vals = np.linspace(x_range[0], x_range[1], 200)
               ax1.plot(x_vals, theoretical_pdf(x_vals), 'k-', lw=2, 
                        label='Theoretical PDF')
       
       ax1.set_xlabel('Value')
       ax1.set_ylabel('Density')
       ax1.set_title('Distribution Comparison')
       ax1.legend(fontsize=8)
       
       # Panel 2: Q-Q plot (constructed vs direct)
       ax2 = axes[1]
       quantiles = np.linspace(0.01, 0.99, 100)
       q_constructed = np.quantile(samples_constructed, quantiles)
       q_direct = np.quantile(samples_direct, quantiles)
       
       ax2.scatter(q_direct, q_constructed, alpha=0.6, s=20, color='purple')
       
       # Add reference line
       lims = [min(q_direct.min(), q_constructed.min()),
               max(q_direct.max(), q_constructed.max())]
       ax2.plot(lims, lims, 'k--', lw=1.5, label='y = x')
       
       ax2.set_xlabel(f'Quantiles ({label_direct})')
       ax2.set_ylabel(f'Quantiles ({label_constructed})')
       ax2.set_title('Q-Q Plot')
       ax2.legend()
       ax2.set_aspect('equal', adjustable='box')
       
       # Panel 3: Empirical CDF comparison
       ax3 = axes[2]
       
       # Sort samples for ECDF
       x_con = np.sort(samples_constructed)
       y_con = np.arange(1, len(x_con) + 1) / len(x_con)
       x_dir = np.sort(samples_direct)
       y_dir = np.arange(1, len(x_dir) + 1) / len(x_dir)
       
       ax3.step(x_con, y_con, where='post', label=label_constructed, 
                color='steelblue', alpha=0.7)
       ax3.step(x_dir, y_dir, where='post', label=label_direct,
                color='coral', alpha=0.7)
       
       ax3.set_xlabel('Value')
       ax3.set_ylabel('Cumulative Probability')
       ax3.set_title('Empirical CDF Comparison')
       ax3.legend(fontsize=8)
       
       plt.tight_layout()
       plt.show()
       
       # Print summary statistics
       print(f"\nSummary Statistics:")
       print(f"  {label_constructed:25s} - Mean: {np.mean(samples_constructed):8.4f}, "
             f"Var: {np.var(samples_constructed):8.4f}, "
             f"Median: {np.median(samples_constructed):8.4f}")
       print(f"  {label_direct:25s} - Mean: {np.mean(samples_direct):8.4f}, "
             f"Var: {np.var(samples_direct):8.4f}, "
             f"Median: {np.median(samples_direct):8.4f}")
       
       # Kolmogorov-Smirnov test
       ks_stat, ks_pval = stats.ks_2samp(samples_constructed, samples_direct)
       print(f"\n  Two-sample KS test: statistic = {ks_stat:.4f}, p-value = {ks_pval:.4f}")
       if ks_pval > 0.05:
           print("  → No significant evidence of difference (p > 0.05)")
       else:
           print("  → Evidence of difference detected (p ≤ 0.05)")
       print("  Note: KS p-values are approximate for discrete distributions")


   def qq_plot_theoretical(samples, dist, title, dist_name):
       """
       Q-Q plot comparing samples against a theoretical distribution.
       
       Parameters
       ----------
       samples : array-like
           Empirical samples
       dist : scipy.stats distribution
           Frozen scipy distribution (e.g., stats.norm(0, 1))
       title : str
           Plot title
       dist_name : str
           Name of theoretical distribution for labeling
       """
       fig, ax = plt.subplots(figsize=(6, 6))
       
       # Compute theoretical and sample quantiles
       n = len(samples)
       theoretical_quantiles = dist.ppf(np.linspace(1/(n+1), n/(n+1), n))
       sample_quantiles = np.sort(samples)
       
       ax.scatter(theoretical_quantiles, sample_quantiles, alpha=0.5, s=10)
       
       # Reference line
       lims = [min(theoretical_quantiles.min(), sample_quantiles.min()),
               max(theoretical_quantiles.max(), sample_quantiles.max())]
       ax.plot(lims, lims, 'r--', lw=2)
       
       ax.set_xlabel(f'Theoretical Quantiles ({dist_name})')
       ax.set_ylabel('Sample Quantiles')
       ax.set_title(title)
       ax.set_aspect('equal', adjustable='box')
       
       plt.tight_layout()
       plt.show()


Part I: MGF and Analytic Derivations
=====================================

These problems use moment generating functions and related analytic techniques to prove distributional identities. For MGF problems, compute the MGF of the relevant quantity, then identify the resulting distribution by matching to known MGF forms.

.. admonition:: Parameterization Conventions ⚠️
   :class: warning

   Pay careful attention to parameterization differences between mathematical conventions and SciPy:
   
   * **Geometric**: We use support :math:`\{1, 2, 3, \ldots\}` (trials until first success). SciPy's ``stats.geom(p)`` matches this convention.
   
   * **Negative Binomial**: SciPy's ``stats.nbinom(n=r, p=p)`` counts *failures before r successes* with support :math:`\{0, 1, 2, \ldots\}`. To get *trials until r successes*, add r to SciPy samples.
   
   * **Gamma**: We write :math:`\text{Gamma}(r, \beta)` with shape :math:`r` and **rate** :math:`\beta`. SciPy uses **scale** = :math:`1/\beta`, so use ``stats.gamma(a=r, scale=1/beta)``.
   
   * **Exponential**: We write :math:`\text{Exp}(\lambda)` with **rate** :math:`\lambda`. SciPy uses **scale** = :math:`1/\lambda`, so use ``stats.expon(scale=1/lambda)``.


Problem 1: Geometric to Negative Binomial
-----------------------------------------------------

Let :math:`X_1, \ldots, X_r` be independent and identically distributed :math:`\text{Geometric}(p)` random variables, where :math:`X_i` counts the number of trials until the first success:

.. math::

   P(X_i = k) = p(1-p)^{k-1}, \quad k = 1, 2, 3, \ldots

Let :math:`S_r = \sum_{i=1}^r X_i` be the total number of trials until :math:`r` successes.

**(a)** Using the MGF of the Geometric distribution and the multiplication property for independent sums, derive the MGF of :math:`S_r`:

.. math::

   M_{S_r}(t) = \left( \frac{pe^t}{1 - (1-p)e^t} \right)^r

**(b)** Conclude that :math:`S_r \sim \text{NegativeBinomial}(r, p)` in the "trials until :math:`r`-th success" parameterization, with PMF:

.. math::

   P(S_r = k) = \binom{k-1}{r-1} p^r (1-p)^{k-r}, \quad k = r, r+1, r+2, \ldots

Explain the combinatorial interpretation of this PMF (why :math:`\binom{k-1}{r-1}`?).

**Computational Verification Tips:**

- Use ``stats.geom.rvs(p=p_val, size=(n_samples, r_val))`` to generate a matrix of Geometric samples, then sum across rows
- For direct Negative Binomial samples, note that SciPy's ``nbinom`` uses the "failures before r successes" parameterization — you'll need to add r to convert to "trials until r successes"
- After deriving the MGFs by hand, use SymPy to check that the product of r identical Geometric MGFs simplifies to the Negative Binomial MGF
- Use ``compare_distributions()`` with ``discrete=True`` to visualize the PMF comparison


Problem 2: Poisson Additivity
-----------------------------------------

Let :math:`X_1, \ldots, X_m` be independent Poisson random variables with possibly **different** rates :math:`\lambda_1, \ldots, \lambda_m`:

.. math::

   P(X_i = k) = e^{-\lambda_i} \frac{\lambda_i^k}{k!}, \quad k = 0, 1, 2, \ldots

Let :math:`S = \sum_{i=1}^m X_i`.

**(a)** Prove that :math:`S \sim \text{Poisson}(\Lambda)` where :math:`\Lambda = \sum_{i=1}^m \lambda_i`.

**(b)** The Poisson family is called **closed under convolution** (or "reproducing"). Explain in one sentence what this means and give a real-world example where this property is useful.

**Computational Verification Tips:**

- Define a lambda function for the Poisson MGF: ``M_pois = lambda lam: exp(lam * (exp(t) - 1))``
- Multiply MGFs with different rates and use ``simplify()`` to verify they combine as expected
- Generate samples from multiple Poisson distributions with different λ values and sum them
- Compare against direct samples from ``stats.poisson.rvs(mu=sum(lambda_vals), ...)``
- For Poisson, mean equals variance — check this holds for both constructed and direct samples


Problem 3: Binomial Additivity
------------------------------------------

Let :math:`X_1, \ldots, X_m` be independent Binomial random variables with **the same success probability** :math:`p` but possibly different numbers of trials :math:`n_1, \ldots, n_m`:

.. math::

   P(X_i = k) = \binom{n_i}{k} p^k (1-p)^{n_i - k}, \quad k = 0, 1, \ldots, n_i

Let :math:`S = \sum_{i=1}^m X_i`.

**(a)** Prove that :math:`S \sim \text{Binomial}(N, p)` where :math:`N = \sum_{i=1}^m n_i`.

**(b)** What happens if the :math:`X_i` have different success probabilities :math:`p_i`? Is the sum still Binomial? Justify your answer (a counterexample or brief argument suffices).

**Computational Verification Tips:**

- The Binomial MGF is :math:`(1 - p + pe^t)^n` — after your hand derivation, use SymPy to check the product simplifies correctly
- Taking logarithms before comparing can simplify the algebra: ``simplify(log(product) - log(combined))``
- Generate Binomial samples with varying n values but the **same** p, then sum
- For part (b), try sampling from Binomials with different p values and observe that the sum's distribution doesn't match any Binomial — plot the PMF to see the discrepancy


Problem 4: Normal Additivity
----------------------------------------

Let :math:`X_1, \ldots, X_n` be independent and identically distributed :math:`\mathcal{N}(\mu, \sigma^2)` random variables. Let :math:`S_n = \sum_{i=1}^n X_i`.

**(a)** Prove that :math:`S_n \sim \mathcal{N}(n\mu, n\sigma^2)`.

**(b)** As a corollary, prove that the sample mean :math:`\bar{X} = S_n/n` follows :math:`\mathcal{N}(\mu, \sigma^2/n)`. This is the exact (finite-sample) distribution, not an asymptotic approximation.

**Computational Verification Tips:**

- The Normal MGF is :math:`\exp(\mu t + \sigma^2 t^2/2)` — after your hand derivation, use SymPy to check that :math:`[M_X(t)]^n` simplifies to :math:`M_{S_n}(t)`
- Generate an (n_samples × n) matrix of Normal samples, then use ``np.sum(..., axis=1)`` for sums and ``np.mean(..., axis=1)`` for means
- Create two visualizations: one for the sum :math:`S_n \sim N(n\mu, n\sigma^2)` and one for the mean :math:`\bar{X} \sim N(\mu, \sigma^2/n)`
- The sample mean distribution is **exact** (not asymptotic) — verify that even small n gives perfect agreement


Problem 5: Chi-Square from Normal Squares
-----------------------------------------------------

This problem establishes one of the most important constructions in statistical inference.

**(a)** Let :math:`Z \sim \mathcal{N}(0, 1)` and define :math:`Y = Z^2`. **Derive from first principles** that:

.. math::

   M_Y(t) = \mathbb{E}[e^{tZ^2}] = (1 - 2t)^{-1/2}, \quad t < \frac{1}{2}

and conclude that :math:`Y \sim \chi^2(1)`.

**Hint**: Write the expectation as an integral over the standard normal density, combine exponents, and use the Gaussian integral identity:

.. math::

   \int_{-\infty}^{\infty} e^{-az^2} \, dz = \sqrt{\frac{\pi}{a}}, \quad a > 0

**(b)** Now let :math:`Z_1, \ldots, Z_n` be iid :math:`\mathcal{N}(0, 1)` and define :math:`Q = \sum_{i=1}^n Z_i^2`. Prove that :math:`Q \sim \chi^2(n)`.

**(c)** Explain why this result implies that for a random sample :math:`X_1, \ldots, X_n` from :math:`\mathcal{N}(\mu, \sigma^2)`:

.. math::

   \frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1)

where :math:`S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2` is the sample variance. (You may state without proof that :math:`\sum (X_i - \bar{X})^2/\sigma^2` involves :math:`n-1` independent squared standard normals.)

**Computational Verification Tips:**

- For part (a), after deriving the MGF by hand, verify your integral computation using SymPy: ``integrate(exp(t*z**2) * exp(-z**2/2) / sqrt(2*pi), (z, -oo, oo), conds='none')`` should simplify to :math:`(1-2t)^{-1/2}`
- For part (b), square standard normal samples and sum across columns, then compare to ``stats.chi2.rvs(df=n, ...)``
- The :math:`\chi^2` distribution is right-skewed with support :math:`[0, \infty)` — set appropriate x-axis limits for visualization
- Theoretical moments: :math:`E[\chi^2_n] = n`, :math:`\text{Var}(\chi^2_n) = 2n`


Problem 6: The Gamma-Poisson-Negative Binomial Triangle
-------------------------------------------------------

This problem explores the deep connections between three distributions that arise naturally in count data modeling. The relationships have both theoretical elegance and practical importance—the Negative Binomial emerges as an "overdispersed Poisson" when the Poisson rate is itself random.

**Background**: In many applications, count data exhibits more variability than a Poisson model predicts. One explanation: the rate parameter :math:`\lambda` varies across observations. If we model this heterogeneity by treating :math:`\lambda` as a Gamma-distributed random variable, the resulting marginal distribution is Negative Binomial.

**(a) Poisson-Gamma Mixture**: Let :math:`X \mid \Lambda = \lambda \sim \text{Poisson}(\lambda)` and let :math:`\Lambda \sim \text{Gamma}(r, \beta)` with PDF:

.. math::

   f_\Lambda(\lambda) = \frac{\beta^r}{\Gamma(r)} \lambda^{r-1} e^{-\beta\lambda}, \quad \lambda > 0

Show that the marginal distribution of :math:`X` (integrating out :math:`\Lambda`) is Negative Binomial. Specifically, prove:

.. math::

   P(X = k) = \binom{k + r - 1}{k} \left(\frac{\beta}{1 + \beta}\right)^r \left(\frac{1}{1 + \beta}\right)^k, \quad k = 0, 1, 2, \ldots

**Hint**: Write :math:`P(X = k) = \int_0^\infty P(X = k \mid \Lambda = \lambda) f_\Lambda(\lambda) \, d\lambda`, substitute the Poisson PMF and Gamma PDF, then recognize the resulting integral as a Gamma function.

**(b) Identifying the Parameterization**: The result in (a) gives a Negative Binomial in the "number of failures before :math:`r` successes" parameterization. Show that if we define :math:`p = \frac{\beta}{1 + \beta}`, then:

.. math::

   P(X = k) = \binom{k + r - 1}{k} p^r (1-p)^k

Verify that :math:`\mathbb{E}[X] = \frac{r(1-p)}{p} = \frac{r}{\beta}` and :math:`\text{Var}(X) = \frac{r(1-p)}{p^2} = \frac{r(1+\beta)}{\beta^2}`.

**(c) Overdispersion**: For a Poisson distribution, :math:`\text{Var}(X) = \mathbb{E}[X]`. Show that for the Negative Binomial from part (b):

.. math::

   \text{Var}(X) = \mathbb{E}[X] + \frac{(\mathbb{E}[X])^2}{r}

This demonstrates **overdispersion**: :math:`\text{Var}(X) > \mathbb{E}[X]`. Explain intuitively why mixing over the rate parameter increases variability.

**(d) Poisson Process Connection**: In a Poisson process with rate :math:`\mu`, let :math:`N(t)` be the number of events by time :math:`t`, and let :math:`T_r` be the time of the :math:`r`-th event. We have:

- :math:`N(t) \sim \text{Poisson}(\mu t)`
- :math:`T_r \sim \text{Gamma}(r, \mu)` (shape :math:`r`, rate :math:`\mu`)

Prove the fundamental equivalence:

.. math::

   P(N(t) \geq r) = P(T_r \leq t)

**Hint**: The event "at least :math:`r` arrivals by time :math:`t`" is equivalent to "the :math:`r`-th arrival occurred by time :math:`t`."

**(e) Exponential-Gamma Additivity**: As a parallel to Problem 1 (Geometric :math:`\to` Negative Binomial), prove using MGFs that if :math:`X_1, \ldots, X_r` are iid :math:`\text{Exponential}(\lambda)`, then:

.. math::

   S_r = \sum_{i=1}^r X_i \sim \text{Gamma}(r, \lambda)

This shows that Gamma is to Exponential as Negative Binomial is to Geometric—the waiting time for :math:`r` events in continuous vs. discrete time.

**Computational Verification Tips:**

- **Part (a/b)**: To construct the Poisson-Gamma mixture, first sample :math:`\Lambda \sim \text{Gamma}(r, \beta)` using ``stats.gamma.rvs(a=r, scale=1/beta, ...)``, then sample :math:`X \mid \Lambda` from ``stats.poisson.rvs(mu=lambda_sample)`` for each :math:`\Lambda` value
- SciPy's Negative Binomial uses the "failures before r successes" parameterization with :math:`p = \beta/(1+\beta)`
- **Part (c)**: Compute the theoretical variance using both the direct formula and the overdispersion formula :math:`\text{Var} = E[X] + E[X]^2/r` to verify they match
- **Part (d)**: Compare ``1 - stats.poisson.cdf(r-1, mu=mu*t)`` with ``stats.gamma.cdf(t, a=r, scale=1/mu)`` — they should be equal to machine precision
- **Part (e)**: Sum Exponential samples and compare to Gamma; note SciPy's ``expon`` uses scale = 1/rate, so ``stats.expon.rvs(scale=1/lambda, ...)``
- Create two visualizations: one for the Poisson-Gamma mixture (discrete) and one for Exponential-to-Gamma (continuous)


Part II: Transformation Methods
===============================

These distributions lack MGFs, requiring transformation (Jacobian) techniques. Derive the PDFs by hand using change-of-variables, then use numerical sampling to verify your results.


Problem 7: Ratio of Normals is Cauchy
-------------------------------------

Let :math:`Z_1, Z_2` be independent :math:`\mathcal{N}(0, 1)` random variables. Define :math:`V = Z_1 / Z_2`.

**(a)** Prove that :math:`V` has the standard Cauchy distribution:

.. math::

   f_V(v) = \frac{1}{\pi(1 + v^2)}, \quad v \in \mathbb{R}

**Suggested approach**: Use polar coordinates. Define :math:`R = \sqrt{Z_1^2 + Z_2^2}` and :math:`\Theta` as the angle, where :math:`\Theta \in (-\pi, \pi]`. Show that:

1. Under the isotropic bivariate standard normal, :math:`\Theta` is uniformly distributed (the angle is independent of radius for spherically symmetric distributions)
2. :math:`V = Z_1/Z_2 = \tan(\Theta)` for :math:`\Theta \neq \pm\pi/2`
3. Use the transformation method to find :math:`f_V(v)` from the uniform distribution on :math:`\Theta`

*Note*: The set where :math:`Z_2 = 0` has probability zero and can be ignored.

**(b)** Verify that the Cauchy PDF integrates to 1. Explain why the MGF :math:`\mathbb{E}[e^{tV}]` does not exist for any :math:`t \neq 0`.

**Computational Verification Tips:**

- After computing the Jacobian by hand for the polar transformation :math:`(z_1, z_2) = (r\cos\theta, r\sin\theta)`, you can check your result using SymPy's ``Matrix([...]).jacobian([r, theta])`` — it should give :math:`|J| = r`
- Verify that the Cauchy PDF integrates to 1 using ``integrate(1/(pi*(1+v**2)), (v, -oo, oo))``
- To construct samples: generate independent :math:`Z_1, Z_2 \sim N(0,1)` and compute :math:`V = Z_1/Z_2`. Do **not** filter samples for correctness—the probability of :math:`Z_2` being exactly zero is zero
- **Visualization challenge**: The Cauchy distribution has such heavy tails that histograms are dominated by extreme values. For readability, clip the histogram x-axis to a central range (e.g., ``[-10, 10]``), but keep all samples and use full quantile comparisons to verify tail behavior
- Compare quantiles at extreme probabilities (1%, 5%, 95%, 99%) to verify the heavy tails match theory
- The Cauchy has no mean or variance — sample moments will be unstable and meaningless


Problem 8: Student's t from Normal and Chi-Square
-------------------------------------------------

Let :math:`Z \sim \mathcal{N}(0, 1)` and :math:`U \sim \chi^2(\nu)` be **independent**. Define:

.. math::

   T = \frac{Z}{\sqrt{U/\nu}}

**(a)** Prove that :math:`T` has Student's :math:`t` distribution with :math:`\nu` degrees of freedom:

.. math::

   f_T(t) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi} \, \Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{t^2}{\nu}\right)^{-(\nu+1)/2}, \quad t \in \mathbb{R}

**Suggested approach**: Use conditioning. Note that :math:`T \mid U = u \sim \mathcal{N}(0, \nu/u)`. Then:

.. math::

   f_T(t) = \int_0^\infty f_{T|U}(t \mid u) \cdot f_U(u) \, du

Substitute the Normal and Chi-square densities, then evaluate the integral using the Gamma function identity.

**(b)** Verify that as :math:`\nu \to \infty`, the :math:`t_\nu` distribution approaches :math:`\mathcal{N}(0, 1)`. Based on your KS statistic plot, propose a data-driven threshold for :math:`\nu` at which the normal approximation becomes adequate (e.g., KS statistic drops below 0.02).

**Computational Verification Tips:**

- Construct :math:`T = Z / \sqrt{U/\nu}` where :math:`Z \sim N(0,1)` and :math:`U \sim \chi^2(\nu)` are independent
- Use ``stats.chi2.rvs(df=nu, ...)`` for chi-squared samples
- Compare against ``stats.t.rvs(df=nu, ...)`` for direct t-distribution samples
- Theoretical variance: :math:`\text{Var}(T) = \nu/(\nu-2)` for :math:`\nu > 2`; undefined for :math:`\nu \leq 2`
- **For part (b)**: Create a two-panel figure: (1) t-distribution PDFs for :math:`\nu \in \{1, 3, 5, 10, 30\}` overlaid with N(0,1), and (2) KS statistic vs. :math:`\nu` for :math:`\nu \in \{3, 5, 10, 15, 20, 25, 30, 40, 50\}`
- A common rule of thumb states :math:`t_{30} \approx N(0,1)` — does your analysis support this?


Problem 9: F Distribution from Chi-Squares
------------------------------------------

Let :math:`U \sim \chi^2(\nu_1)` and :math:`V \sim \chi^2(\nu_2)` be **independent**. Define:

.. math::

   F = \frac{U/\nu_1}{V/\nu_2}

**(a)** Prove that :math:`F` has the :math:`F(\nu_1, \nu_2)` distribution:

.. math::

   f_F(x) = \frac{1}{B(\nu_1/2, \nu_2/2)} \left(\frac{\nu_1}{\nu_2}\right)^{\nu_1/2} x^{\nu_1/2 - 1} \left(1 + \frac{\nu_1}{\nu_2}x\right)^{-(\nu_1+\nu_2)/2}, \quad x > 0

where :math:`B(a, b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}` is the Beta function.

**Hint**: Use the Gamma representation: :math:`\chi^2(\nu) = \text{Gamma}(\nu/2, 1/2)`. Transform :math:`(U, V) \mapsto (F, V)` and integrate out :math:`V`.

**(b)** Show that if :math:`T \sim t(\nu)`, then :math:`T^2 \sim F(1, \nu)`. This connects the :math:`t`-test to the :math:`F`-test.

**Computational Verification Tips:**

- Construct :math:`F = (U/\nu_1) / (V/\nu_2)` where :math:`U \sim \chi^2(\nu_1)` and :math:`V \sim \chi^2(\nu_2)` are independent
- Compare against ``stats.f.rvs(dfn=nu1, dfd=nu2, ...)`` for direct F-distribution samples
- Theoretical mean: :math:`E[F] = \nu_2/(\nu_2 - 2)` for :math:`\nu_2 > 2` (note: depends only on denominator df)
- The F distribution has support :math:`[0, \infty)` and is right-skewed — set x-axis limits appropriately
- **For part (b)**: Generate :math:`T \sim t(\nu)` samples, square them, and compare to :math:`F(1, \nu)` samples
- This demonstrates why a two-sided t-test at level :math:`\alpha` is equivalent to an F-test at level :math:`\alpha`: if :math:`|T| > t_{\alpha/2, \nu}`, then :math:`T^2 > F_{\alpha, 1, \nu}`
- Create two visualizations: one for the general F construction, one for the :math:`T^2 \sim F(1, \nu)` relationship


Part III: Computational Foundations
===================================

This section tests your understanding of the fundamental computational concepts from Chapter 1: the Central Limit Theorem, convergence in distribution, and proper use of Python's random generation ecosystem.


Problem 10: The Central Limit Theorem in Action
-----------------------------------------------

The **Central Limit Theorem (CLT)** states that for iid random variables :math:`X_1, \ldots, X_n` with mean :math:`\mu` and variance :math:`\sigma^2 < \infty`:

.. math::

   \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1) \quad \text{as } n \to \infty

This problem explores the CLT empirically across distributions with very different shapes.

**(a) CLT Universality**: For each of the following distributions, generate **5,000 samples** of size :math:`n` for :math:`n \in \{1, 2, 5, 10, 30, 100\}`. For each sample, compute the standardized sample mean:

.. math::

   Z_n = \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}}

where :math:`\mu` and :math:`\sigma` are the **theoretical** mean and standard deviation.

Test with these distributions (use the theoretical moments given):

1. **Exponential(1)**: :math:`\mu = 1`, :math:`\sigma = 1` (right-skewed)
2. **Uniform(0, 1)**: :math:`\mu = 0.5`, :math:`\sigma = 1/\sqrt{12}` (symmetric, bounded)
3. **Bernoulli(0.1)**: :math:`\mu = 0.1`, :math:`\sigma = \sqrt{0.1 \times 0.9}` (discrete, highly skewed)
4. **Beta(0.5, 0.5)**: :math:`\mu = 0.5`, :math:`\sigma = 0.5/\sqrt{2}` (U-shaped)

Create a :math:`4 \times 6` grid of histograms showing the distribution of :math:`Z_n` for each (distribution, n) combination, with the :math:`N(0,1)` PDF overlaid. Title each panel with the distribution name and sample size.

**(b) Quantifying Convergence with the Kolmogorov-Smirnov Test**: The **Kolmogorov-Smirnov (KS) statistic** measures the maximum vertical distance between two cumulative distribution functions:

.. math::

   D_n = \sup_x |F_n(x) - F(x)|

where :math:`F_n(x)` is the empirical CDF and :math:`F(x)` is the reference CDF. Smaller values indicate better agreement. The one-sample KS test compares a sample against a theoretical distribution; ``stats.kstest(samples, 'norm')`` returns both the KS statistic and a p-value testing the null hypothesis that the sample came from :math:`N(0,1)`.

For each distribution from part (a), compute the KS statistic between your standardized means and :math:`N(0,1)` for each value of :math:`n`. Create a single plot showing KS statistic vs. :math:`n` (use log scale for n) with one line per distribution. Which distribution converges fastest? Slowest? Explain why in terms of the source distribution's shape (symmetry, boundedness, skewness).

**(c) Studentized Statistics**: In practice, we rarely know the true :math:`\mu` and :math:`\sigma`. The **studentized** version of the CLT uses sample estimates:

.. math::

   T_n = \frac{\bar{X}_n - \mu}{S_n / \sqrt{n}}

where :math:`S_n` is the sample standard deviation. For normal populations, :math:`T_n \sim t(n-1)` exactly. For non-normal populations, :math:`T_n \xrightarrow{d} N(0,1)` as :math:`n \to \infty`, but convergence may be slower.

Repeat part (a) using the studentized statistic :math:`T_n` instead of :math:`Z_n` (replace :math:`\sigma` with :math:`S_n` computed from each sample). Create another :math:`4 \times 6` grid and compare to your results from part (a). For which distributions and sample sizes does using estimated variance make a noticeable difference?

**Computational Verification Tips:**

- Use ``np.random.default_rng(seed)`` for reproducibility
- For each (distribution, n) combination, generate a matrix of shape ``(n_simulations, n)`` and compute row means using ``np.mean(..., axis=1)``
- For part (a), standardize using theoretical moments
- For part (c), compute sample standard deviations with ``np.std(..., axis=1, ddof=1)`` (use ``ddof=1`` for the unbiased estimator)
- Use ``stats.kstest(samples, 'norm')`` to get the KS statistic—the function returns a named tuple with ``.statistic`` and ``.pvalue`` attributes


Problem 11: Reproducibility and Parallel Streams
------------------------------------------------

Proper random number generation is critical for reproducible research. This problem tests your understanding of seeds, generator state, and parallel streams.

**(a) Seed Sensitivity**: Consider estimating :math:`E[X^2]` where :math:`X \sim N(0, 1)` (true value = 1) using :math:`n = 100` samples.

- Run this estimation with 10 different seeds (0 through 9)
- Report the 10 estimates and their sample standard deviation
- Now repeat with :math:`n = 10000` samples — how does seed sensitivity change?
- Explain the relationship between sample size and seed sensitivity

**(b) Generator State**: Demonstrate that you understand generator state by completing this task:

- Create a generator with seed 42
- Draw 100 uniform samples and compute their mean (call it ``mean_a``)
- Save the generator state using ``.bit_generator.state``
- Draw 100 more uniform samples and compute their mean (call it ``mean_b``)
- Restore the saved state
- Draw 100 uniform samples and verify their mean equals ``mean_b``

Report all three means and confirm the state restoration worked.

**(c) Parallel Streams**: When running simulations in parallel, each worker needs an independent random stream. Using ``SeedSequence.spawn()``:

- Create a parent ``SeedSequence`` with entropy 42
- Spawn 4 child sequences
- Create a generator from each child
- From each generator, draw 10000 samples from :math:`N(0, 1)` and compute the mean
- Verify the streams are independent by computing pairwise correlations between the raw samples (should be near 0)

Report the 4 means and the 6 pairwise correlations.

**(d) The Birthday Problem Revisited**: Use Monte Carlo simulation to estimate the probability that in a room of :math:`k` people, at least two share a birthday (assume 365 equally likely birthdays).

- Write a function ``birthday_collision(k, rng)`` that simulates one room and returns True if there's a collision
- Estimate the probability for :math:`k \in \{10, 20, 23, 30, 40, 50\}` using 100,000 simulations each
- Compare to the exact formula: :math:`P(\text{collision}) = 1 - \frac{365!}{365^k (365-k)!}`
- Your estimates should be within 0.01 of the exact values

**Computational Verification Tips:**

- Use ``rng.bit_generator.state`` to save/restore state (it's a dictionary)
- For ``SeedSequence``, use ``from numpy.random import SeedSequence, default_rng``
- For parallel streams: ``children = SeedSequence(42).spawn(4)``, then ``rng_i = default_rng(children[i])``
- For birthday collisions, ``rng.integers(0, 365, size=k)`` gives k random birthdays; check for duplicates with ``len(set(birthdays)) < k``
- The exact birthday formula can overflow; use ``scipy.special.perm(365, k)`` or compute in log space