Homework 1: Distributional Relationships and Computational Foundations

Due:: [TBD]
Submission:: Upload your written derivations (PDF) and Python verification script (.py or .ipynb) to Gradescope

Overview and Learning Objectives 

Probability distributions don’t exist in isolation—they form an interconnected family linked through sums, limits, transformations, and special cases. Understanding these relationships is essential for:

Simulation: Generating samples from complex distributions using simpler ones
Inference: Deriving sampling distributions of test statistics
Model selection: Recognizing when a simpler distribution is appropriate
Derivation: Proving properties of one distribution from another

This assignment develops your ability to prove distributional relationships rigorously and verify your work computationally. You’ll use moment generating functions (MGFs) where they exist and transformation techniques where they don’t—learning when each tool is appropriate.

Learning Outcomes Addressed

LO 1: Apply simulation techniques, including transformation methods for random variable generation
LO 2: Compare mathematical frameworks for probabilistic reasoning and inference
LO 3: Implement computational methods for verifying theoretical results

Assignment Structure 

The assignment has three parts:

Part I: MGF and Analytic Derivations (Problems 1–6): Use MGFs where they exist, plus related analytic techniques (sums, transforms, mixtures) to prove distributional identities
Part II: Transformation Methods (Problems 7–9): Use Jacobian transformations for distributions lacking MGFs (Cauchy, t, F)
Part III: Computational Foundations (Problems 10–11): Empirically verify the Central Limit Theorem and demonstrate proper use of random number generation

Each problem in Parts I and II requires two components:

Mathematical Derivation (handwritten or typeset): Show all steps, define notation, state assumptions, and justify each equality. This is the primary assessment—your derivation must be complete and correct.
Computational Verification (Python): Use the visualization utilities provided to verify your algebra and explore the result numerically. This catches errors and builds intuition.

Part III problems are primarily computational—you’ll implement simulations and create visualizations that demonstrate key theoretical concepts from Chapter 1.

The computational component is not a shortcut—it verifies work you’ve already done by hand. Start each problem with pencil and paper before touching the keyboard.

Mathematical Preliminaries 

This section provides reference material. You may cite these results without re-deriving them.

Moment Generating Functions 

For a random variable \(X\), the moment generating function (MGF) is defined as:

\[M_X(t) = \mathbb{E}[e^{tX}]\]

for all \(t\) in an open interval containing zero where the expectation exists.

Key Property (Independent Sums): If \(X_1, \ldots, X_n\) are independent and each \(M_{X_i}(t)\) exists in a neighborhood of zero, then:

\[M_{\sum_{i=1}^n X_i}(t) = \prod_{i=1}^n M_{X_i}(t)\]

Uniqueness: Under standard regularity conditions, the MGF uniquely determines the distribution. If two random variables have the same MGF in a neighborhood of zero, they have the same distribution.

Reference: Common MGFs

The following MGFs may be used without derivation:

Table 1 Common Moment Generating Functions
Distribution	PMF/PDF	MGF \(M_X(t)\)
Geometric(\(p\))	\(P(X=k) = p(1-p)^{k-1}\), \(k=1,2,\ldots\)	\(\dfrac{pe^t}{1-(1-p)e^t}\), \(t < -\ln(1-p)\)
NegBinom(\(r, p\))	\(P(Y=k) = \binom{k-1}{r-1}p^r(1-p)^{k-r}\), \(k \geq r\)	\(\left(\dfrac{pe^t}{1-(1-p)e^t}\right)^r\)
Poisson(\(\lambda\))	\(P(X=k) = e^{-\lambda}\dfrac{\lambda^k}{k!}\), \(k=0,1,\ldots\)	\(\exp(\lambda(e^t - 1))\)
Binomial(\(n,p\))	\(P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}\), \(k=0,\ldots,n\)	\((1 - p + pe^t)^n\)
Normal(\(\mu, \sigma^2\))	\(f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\)	\(\exp\left(\mu t + \frac{\sigma^2 t^2}{2}\right)\)
\(\chi^2(\nu)\)	\(f(x) = \frac{x^{\nu/2-1}e^{-x/2}}{2^{\nu/2}\Gamma(\nu/2)}\), \(x > 0\)	\((1 - 2t)^{-\nu/2}\), \(t < 1/2\)

Important: The Cauchy, Student’s \(t\), and \(F\) distributions do not have MGFs (the expectation diverges). For these, use transformation/Jacobian methods.

Table 2 Distributions Without MGFs (for Part II reference)
Distribution	PDF	Support
Cauchy (standard)	\(f(x) = \dfrac{1}{\pi(1 + x^2)}\)	\(x \in \mathbb{R}\)
Student’s \(t(\nu)\)	\(f(x) = \dfrac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\,\Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{x^2}{\nu}\right)^{-(\nu+1)/2}\)	\(x \in \mathbb{R}\)
\(F(\nu_1, \nu_2)\)	\(f(x) = \dfrac{1}{B(\nu_1/2, \nu_2/2)} \left(\frac{\nu_1}{\nu_2}\right)^{\nu_1/2} x^{\nu_1/2-1} \left(1 + \frac{\nu_1 x}{\nu_2}\right)^{-(\nu_1+\nu_2)/2}\)	\(x > 0\)

These distributions arise from ratios and other transformations of normal and chi-square random variables, which is why transformation methods are required.

When MGFs Fail: Transformation Methods 

When an MGF doesn’t exist, we derive distributions using the transformation (Jacobian) method:

For \(Y = g(X)\) where \(g\) is monotonic with inverse \(g^{-1}\):

\[f_Y(y) = f_X(g^{-1}(y)) \cdot \left| \frac{d}{dy} g^{-1}(y) \right|\]

For multivariate transformations \((Y_1, Y_2) = g(X_1, X_2)\):

\[f_{Y_1, Y_2}(y_1, y_2) = f_{X_1, X_2}(g^{-1}(y_1, y_2)) \cdot |J|\]

where \(|J|\) is the absolute value of the Jacobian determinant.

Computational Setup 

Before starting, set up your Python environment. The code below provides tools for verifying your hand derivations—SymPy can check symbolic algebra, and numerical sampling confirms distributional results.

"""
Homework 1: Distributional Relationships
Computational Verification Template

Instructions:
1. Complete the hand derivation FIRST (on paper or iPad)
2. Use this code to verify your algebraic work
3. Include output and figures in your submission
"""

import numpy as np
import sympy as sp
from sympy import (symbols, exp, log, sqrt, pi, simplify, expand,
                   integrate, diff, factorial, binomial, gamma, oo,
                   Rational, latex, init_printing)
from sympy.stats import Normal, ChiSquared, density, E
from scipy import stats
import matplotlib.pyplot as plt

init_printing()  # Pretty symbolic output

# Common symbols (define once, reuse throughout)
t, p, r, k, n, m = symbols('t p r k n m', positive=True, real=True)
lam, lam1, lam2 = symbols('lambda lambda_1 lambda_2', positive=True)
mu, sigma = symbols('mu sigma', real=True)
nu, nu1, nu2 = symbols('nu nu_1 nu_2', positive=True, integer=True)
z, u, x = symbols('z u x', real=True)

Visual Verification Tools 

A key component of this assignment is visually verifying your theoretical results. For each distributional relationship, you will:

Sample directly from the target distribution (e.g., Negative Binomial)
Construct samples by sampling from the building-block distributions and applying the appropriate transformation (e.g., sum of Geometrics)
Compare visually using histograms, kernel density estimates, and Q-Q plots

If your derivation is correct, the two sampling methods should produce indistinguishable distributions.

Visualization Utilities 

Use the following helper functions throughout the assignment:

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

def compare_distributions(samples_constructed, samples_direct,
                          title, label_constructed, label_direct,
                          discrete=False, theoretical_pmf=None,
                          theoretical_pdf=None, x_range=None):
    """
    Visually compare two sets of samples that should follow the same distribution.

    Parameters
    ----------
    samples_constructed : array-like
        Samples obtained by transforming/combining simpler distributions
    samples_direct : array-like
        Samples drawn directly from the target distribution
    title : str
        Plot title
    label_constructed : str
        Label for constructed samples (e.g., "Sum of Geometrics")
    label_direct : str
        Label for direct samples (e.g., "NegBinom(r,p)")
    discrete : bool
        If True, use bar plots for PMF comparison
    theoretical_pmf : callable, optional
        Function computing theoretical PMF at integer values
    theoretical_pdf : callable, optional
        Function computing theoretical PDF
    x_range : tuple, optional
        (min, max) for x-axis; auto-determined if None
    """
    fig, axes = plt.subplots(1, 3, figsize=(14, 4))
    fig.suptitle(title, fontsize=14, fontweight='bold')

    # Panel 1: Histogram/PMF comparison
    ax1 = axes[0]
    if discrete:
        # For discrete distributions, use aligned bar plots
        all_vals = np.concatenate([samples_constructed, samples_direct])
        bins = np.arange(all_vals.min() - 0.5, all_vals.max() + 1.5, 1)

        ax1.hist(samples_constructed, bins=bins, density=True, alpha=0.5,
                 label=label_constructed, color='steelblue', edgecolor='white')
        ax1.hist(samples_direct, bins=bins, density=True, alpha=0.5,
                 label=label_direct, color='coral', edgecolor='white')

        if theoretical_pmf is not None:
            x_vals = np.arange(int(all_vals.min()), int(all_vals.max()) + 1)
            ax1.plot(x_vals, [theoretical_pmf(k) for k in x_vals], 'k-',
                     lw=2, label='Theoretical PMF')
    else:
        # For continuous distributions, use KDE
        ax1.hist(samples_constructed, bins=50, density=True, alpha=0.5,
                 label=label_constructed, color='steelblue', edgecolor='white')
        ax1.hist(samples_direct, bins=50, density=True, alpha=0.5,
                 label=label_direct, color='coral', edgecolor='white')

        if theoretical_pdf is not None:
            if x_range is None:
                x_range = (min(samples_constructed.min(), samples_direct.min()),
                           max(samples_constructed.max(), samples_direct.max()))
            x_vals = np.linspace(x_range[0], x_range[1], 200)
            ax1.plot(x_vals, theoretical_pdf(x_vals), 'k-', lw=2,
                     label='Theoretical PDF')

    ax1.set_xlabel('Value')
    ax1.set_ylabel('Density')
    ax1.set_title('Distribution Comparison')
    ax1.legend(fontsize=8)

    # Panel 2: Q-Q plot (constructed vs direct)
    ax2 = axes[1]
    quantiles = np.linspace(0.01, 0.99, 100)
    q_constructed = np.quantile(samples_constructed, quantiles)
    q_direct = np.quantile(samples_direct, quantiles)

    ax2.scatter(q_direct, q_constructed, alpha=0.6, s=20, color='purple')

    # Add reference line
    lims = [min(q_direct.min(), q_constructed.min()),
            max(q_direct.max(), q_constructed.max())]
    ax2.plot(lims, lims, 'k--', lw=1.5, label='y = x')

    ax2.set_xlabel(f'Quantiles ({label_direct})')
    ax2.set_ylabel(f'Quantiles ({label_constructed})')
    ax2.set_title('Q-Q Plot')
    ax2.legend()
    ax2.set_aspect('equal', adjustable='box')

    # Panel 3: Empirical CDF comparison
    ax3 = axes[2]

    # Sort samples for ECDF
    x_con = np.sort(samples_constructed)
    y_con = np.arange(1, len(x_con) + 1) / len(x_con)
    x_dir = np.sort(samples_direct)
    y_dir = np.arange(1, len(x_dir) + 1) / len(x_dir)

    ax3.step(x_con, y_con, where='post', label=label_constructed,
             color='steelblue', alpha=0.7)
    ax3.step(x_dir, y_dir, where='post', label=label_direct,
             color='coral', alpha=0.7)

    ax3.set_xlabel('Value')
    ax3.set_ylabel('Cumulative Probability')
    ax3.set_title('Empirical CDF Comparison')
    ax3.legend(fontsize=8)

    plt.tight_layout()
    plt.show()

    # Print summary statistics
    print(f"\nSummary Statistics:")
    print(f"  {label_constructed:25s} - Mean: {np.mean(samples_constructed):8.4f}, "
          f"Var: {np.var(samples_constructed):8.4f}, "
          f"Median: {np.median(samples_constructed):8.4f}")
    print(f"  {label_direct:25s} - Mean: {np.mean(samples_direct):8.4f}, "
          f"Var: {np.var(samples_direct):8.4f}, "
          f"Median: {np.median(samples_direct):8.4f}")

    # Kolmogorov-Smirnov test
    ks_stat, ks_pval = stats.ks_2samp(samples_constructed, samples_direct)
    print(f"\n  Two-sample KS test: statistic = {ks_stat:.4f}, p-value = {ks_pval:.4f}")
    if ks_pval > 0.05:
        print("  → No significant evidence of difference (p > 0.05)")
    else:
        print("  → Evidence of difference detected (p ≤ 0.05)")
    print("  Note: KS p-values are approximate for discrete distributions")


def qq_plot_theoretical(samples, dist, title, dist_name):
    """
    Q-Q plot comparing samples against a theoretical distribution.

    Parameters
    ----------
    samples : array-like
        Empirical samples
    dist : scipy.stats distribution
        Frozen scipy distribution (e.g., stats.norm(0, 1))
    title : str
        Plot title
    dist_name : str
        Name of theoretical distribution for labeling
    """
    fig, ax = plt.subplots(figsize=(6, 6))

    # Compute theoretical and sample quantiles
    n = len(samples)
    theoretical_quantiles = dist.ppf(np.linspace(1/(n+1), n/(n+1), n))
    sample_quantiles = np.sort(samples)

    ax.scatter(theoretical_quantiles, sample_quantiles, alpha=0.5, s=10)

    # Reference line
    lims = [min(theoretical_quantiles.min(), sample_quantiles.min()),
            max(theoretical_quantiles.max(), sample_quantiles.max())]
    ax.plot(lims, lims, 'r--', lw=2)

    ax.set_xlabel(f'Theoretical Quantiles ({dist_name})')
    ax.set_ylabel('Sample Quantiles')
    ax.set_title(title)
    ax.set_aspect('equal', adjustable='box')

    plt.tight_layout()
    plt.show()

Part I: MGF and Analytic Derivations 

These problems use moment generating functions and related analytic techniques to prove distributional identities. For MGF problems, compute the MGF of the relevant quantity, then identify the resulting distribution by matching to known MGF forms.

Parameterization Conventions ⚠️

Pay careful attention to parameterization differences between mathematical conventions and SciPy:

Geometric: We use support \(\{1, 2, 3, \ldots\}\) (trials until first success). SciPy’s stats.geom(p) matches this convention.
Negative Binomial: SciPy’s stats.nbinom(n=r, p=p) counts failures before r successes with support \(\{0, 1, 2, \ldots\}\). To get trials until r successes, add r to SciPy samples.
Gamma: We write \(\text{Gamma}(r, \beta)\) with shape \(r\) and rate \(\beta\). SciPy uses scale = \(1/\beta\), so use stats.gamma(a=r, scale=1/beta).
Exponential: We write \(\text{Exp}(\lambda)\) with rate \(\lambda\). SciPy uses scale = \(1/\lambda\), so use stats.expon(scale=1/lambda).

Problem 1: Geometric to Negative Binomial 

Let \(X_1, \ldots, X_r\) be independent and identically distributed \(\text{Geometric}(p)\) random variables, where \(X_i\) counts the number of trials until the first success:

\[P(X_i = k) = p(1-p)^{k-1}, \quad k = 1, 2, 3, \ldots\]

Let \(S_r = \sum_{i=1}^r X_i\) be the total number of trials until \(r\) successes.

(a) Using the MGF of the Geometric distribution and the multiplication property for independent sums, derive the MGF of \(S_r\):

\[M_{S_r}(t) = \left( \frac{pe^t}{1 - (1-p)e^t} \right)^r\]

(b) Conclude that \(S_r \sim \text{NegativeBinomial}(r, p)\) in the “trials until \(r\)-th success” parameterization, with PMF:

\[P(S_r = k) = \binom{k-1}{r-1} p^r (1-p)^{k-r}, \quad k = r, r+1, r+2, \ldots\]

Explain the combinatorial interpretation of this PMF (why \(\binom{k-1}{r-1}\)?).

Computational Verification Tips:

Use stats.geom.rvs(p=p_val, size=(n_samples, r_val)) to generate a matrix of Geometric samples, then sum across rows
For direct Negative Binomial samples, note that SciPy’s nbinom uses the “failures before r successes” parameterization — you’ll need to add r to convert to “trials until r successes”
After deriving the MGFs by hand, use SymPy to check that the product of r identical Geometric MGFs simplifies to the Negative Binomial MGF
Use compare_distributions() with discrete=True to visualize the PMF comparison

Problem 2: Poisson Additivity 

Let \(X_1, \ldots, X_m\) be independent Poisson random variables with possibly different rates \(\lambda_1, \ldots, \lambda_m\):

\[P(X_i = k) = e^{-\lambda_i} \frac{\lambda_i^k}{k!}, \quad k = 0, 1, 2, \ldots\]

Let \(S = \sum_{i=1}^m X_i\).

(a) Prove that \(S \sim \text{Poisson}(\Lambda)\) where \(\Lambda = \sum_{i=1}^m \lambda_i\).

(b) The Poisson family is called closed under convolution (or “reproducing”). Explain in one sentence what this means and give a real-world example where this property is useful.

Computational Verification Tips:

Define a lambda function for the Poisson MGF: M_pois = lambda lam: exp(lam * (exp(t) - 1))
Multiply MGFs with different rates and use simplify() to verify they combine as expected
Generate samples from multiple Poisson distributions with different λ values and sum them
Compare against direct samples from stats.poisson.rvs(mu=sum(lambda_vals), ...)
For Poisson, mean equals variance — check this holds for both constructed and direct samples

Problem 3: Binomial Additivity 

Let \(X_1, \ldots, X_m\) be independent Binomial random variables with the same success probability \(p\) but possibly different numbers of trials \(n_1, \ldots, n_m\):

\[P(X_i = k) = \binom{n_i}{k} p^k (1-p)^{n_i - k}, \quad k = 0, 1, \ldots, n_i\]

Let \(S = \sum_{i=1}^m X_i\).

(a) Prove that \(S \sim \text{Binomial}(N, p)\) where \(N = \sum_{i=1}^m n_i\).

(b) What happens if the \(X_i\) have different success probabilities \(p_i\)? Is the sum still Binomial? Justify your answer (a counterexample or brief argument suffices).

Computational Verification Tips:

The Binomial MGF is \((1 - p + pe^t)^n\) — after your hand derivation, use SymPy to check the product simplifies correctly
Taking logarithms before comparing can simplify the algebra: simplify(log(product) - log(combined))
Generate Binomial samples with varying n values but the same p, then sum
For part (b), try sampling from Binomials with different p values and observe that the sum’s distribution doesn’t match any Binomial — plot the PMF to see the discrepancy

Problem 4: Normal Additivity 

Let \(X_1, \ldots, X_n\) be independent and identically distributed \(\mathcal{N}(\mu, \sigma^2)\) random variables. Let \(S_n = \sum_{i=1}^n X_i\).

(a) Prove that \(S_n \sim \mathcal{N}(n\mu, n\sigma^2)\).

(b) As a corollary, prove that the sample mean \(\bar{X} = S_n/n\) follows \(\mathcal{N}(\mu, \sigma^2/n)\). This is the exact (finite-sample) distribution, not an asymptotic approximation.

Computational Verification Tips:

The Normal MGF is \(\exp(\mu t + \sigma^2 t^2/2)\) — after your hand derivation, use SymPy to check that \([M_X(t)]^n\) simplifies to \(M_{S_n}(t)\)
Generate an (n_samples × n) matrix of Normal samples, then use np.sum(..., axis=1) for sums and np.mean(..., axis=1) for means
Create two visualizations: one for the sum \(S_n \sim N(n\mu, n\sigma^2)\) and one for the mean \(\bar{X} \sim N(\mu, \sigma^2/n)\)
The sample mean distribution is exact (not asymptotic) — verify that even small n gives perfect agreement

Problem 5: Chi-Square from Normal Squares 

This problem establishes one of the most important constructions in statistical inference.

(a) Let \(Z \sim \mathcal{N}(0, 1)\) and define \(Y = Z^2\). Derive from first principles that:

\[M_Y(t) = \mathbb{E}[e^{tZ^2}] = (1 - 2t)^{-1/2}, \quad t < \frac{1}{2}\]

and conclude that \(Y \sim \chi^2(1)\).

Hint: Write the expectation as an integral over the standard normal density, combine exponents, and use the Gaussian integral identity:

\[\int_{-\infty}^{\infty} e^{-az^2} \, dz = \sqrt{\frac{\pi}{a}}, \quad a > 0\]

(b) Now let \(Z_1, \ldots, Z_n\) be iid \(\mathcal{N}(0, 1)\) and define \(Q = \sum_{i=1}^n Z_i^2\). Prove that \(Q \sim \chi^2(n)\).

(c) Explain why this result implies that for a random sample \(X_1, \ldots, X_n\) from \(\mathcal{N}(\mu, \sigma^2)\):

\[\frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1)\]

where \(S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2\) is the sample variance. (You may state without proof that \(\sum (X_i - \bar{X})^2/\sigma^2\) involves \(n-1\) independent squared standard normals.)

Computational Verification Tips:

For part (a), after deriving the MGF by hand, verify your integral computation using SymPy: integrate(exp(t*z**2) * exp(-z**2/2) / sqrt(2*pi), (z, -oo, oo), conds='none') should simplify to \((1-2t)^{-1/2}\)
For part (b), square standard normal samples and sum across columns, then compare to stats.chi2.rvs(df=n, ...)
The \(\chi^2\) distribution is right-skewed with support \([0, \infty)\) — set appropriate x-axis limits for visualization
Theoretical moments: \(E[\chi^2_n] = n\), \(\text{Var}(\chi^2_n) = 2n\)

Problem 6: The Gamma-Poisson-Negative Binomial Triangle 

This problem explores the deep connections between three distributions that arise naturally in count data modeling. The relationships have both theoretical elegance and practical importance—the Negative Binomial emerges as an “overdispersed Poisson” when the Poisson rate is itself random.

Background: In many applications, count data exhibits more variability than a Poisson model predicts. One explanation: the rate parameter \(\lambda\) varies across observations. If we model this heterogeneity by treating \(\lambda\) as a Gamma-distributed random variable, the resulting marginal distribution is Negative Binomial.

(a) Poisson-Gamma Mixture: Let \(X \mid \Lambda = \lambda \sim \text{Poisson}(\lambda)\) and let \(\Lambda \sim \text{Gamma}(r, \beta)\) with PDF:

\[f_\Lambda(\lambda) = \frac{\beta^r}{\Gamma(r)} \lambda^{r-1} e^{-\beta\lambda}, \quad \lambda > 0\]

Show that the marginal distribution of \(X\) (integrating out \(\Lambda\)) is Negative Binomial. Specifically, prove:

\[P(X = k) = \binom{k + r - 1}{k} \left(\frac{\beta}{1 + \beta}\right)^r \left(\frac{1}{1 + \beta}\right)^k, \quad k = 0, 1, 2, \ldots\]

Hint: Write \(P(X = k) = \int_0^\infty P(X = k \mid \Lambda = \lambda) f_\Lambda(\lambda) \, d\lambda\), substitute the Poisson PMF and Gamma PDF, then recognize the resulting integral as a Gamma function.

(b) Identifying the Parameterization: The result in (a) gives a Negative Binomial in the “number of failures before \(r\) successes” parameterization. Show that if we define \(p = \frac{\beta}{1 + \beta}\), then:

\[P(X = k) = \binom{k + r - 1}{k} p^r (1-p)^k\]

Verify that \(\mathbb{E}[X] = \frac{r(1-p)}{p} = \frac{r}{\beta}\) and \(\text{Var}(X) = \frac{r(1-p)}{p^2} = \frac{r(1+\beta)}{\beta^2}\).

(c) Overdispersion: For a Poisson distribution, \(\text{Var}(X) = \mathbb{E}[X]\). Show that for the Negative Binomial from part (b):

\[\text{Var}(X) = \mathbb{E}[X] + \frac{(\mathbb{E}[X])^2}{r}\]

This demonstrates overdispersion: \(\text{Var}(X) > \mathbb{E}[X]\). Explain intuitively why mixing over the rate parameter increases variability.

(d) Poisson Process Connection: In a Poisson process with rate \(\mu\), let \(N(t)\) be the number of events by time \(t\), and let \(T_r\) be the time of the \(r\)-th event. We have:

\(N(t) \sim \text{Poisson}(\mu t)\)
\(T_r \sim \text{Gamma}(r, \mu)\) (shape \(r\), rate \(\mu\))

Prove the fundamental equivalence:

\[P(N(t) \geq r) = P(T_r \leq t)\]

Hint: The event “at least \(r\) arrivals by time \(t\)” is equivalent to “the \(r\)-th arrival occurred by time \(t\).”

(e) Exponential-Gamma Additivity: As a parallel to Problem 1 (Geometric \(\to\) Negative Binomial), prove using MGFs that if \(X_1, \ldots, X_r\) are iid \(\text{Exponential}(\lambda)\), then:

\[S_r = \sum_{i=1}^r X_i \sim \text{Gamma}(r, \lambda)\]

This shows that Gamma is to Exponential as Negative Binomial is to Geometric—the waiting time for \(r\) events in continuous vs. discrete time.

Computational Verification Tips:

Part (a/b): To construct the Poisson-Gamma mixture, first sample \(\Lambda \sim \text{Gamma}(r, \beta)\) using stats.gamma.rvs(a=r, scale=1/beta, ...), then sample \(X \mid \Lambda\) from stats.poisson.rvs(mu=lambda_sample) for each \(\Lambda\) value
SciPy’s Negative Binomial uses the “failures before r successes” parameterization with \(p = \beta/(1+\beta)\)
Part (c): Compute the theoretical variance using both the direct formula and the overdispersion formula \(\text{Var} = E[X] + E[X]^2/r\) to verify they match
Part (d): Compare 1 - stats.poisson.cdf(r-1, mu=mu*t) with stats.gamma.cdf(t, a=r, scale=1/mu) — they should be equal to machine precision
Part (e): Sum Exponential samples and compare to Gamma; note SciPy’s expon uses scale = 1/rate, so stats.expon.rvs(scale=1/lambda, ...)
Create two visualizations: one for the Poisson-Gamma mixture (discrete) and one for Exponential-to-Gamma (continuous)

Part II: Transformation Methods 

These distributions lack MGFs, requiring transformation (Jacobian) techniques. Derive the PDFs by hand using change-of-variables, then use numerical sampling to verify your results.

Problem 7: Ratio of Normals is Cauchy 

Let \(Z_1, Z_2\) be independent \(\mathcal{N}(0, 1)\) random variables. Define \(V = Z_1 / Z_2\).

(a) Prove that \(V\) has the standard Cauchy distribution:

\[f_V(v) = \frac{1}{\pi(1 + v^2)}, \quad v \in \mathbb{R}\]

Suggested approach: Use polar coordinates. Define \(R = \sqrt{Z_1^2 + Z_2^2}\) and \(\Theta\) as the angle, where \(\Theta \in (-\pi, \pi]\). Show that:

Under the isotropic bivariate standard normal, \(\Theta\) is uniformly distributed (the angle is independent of radius for spherically symmetric distributions)
\(V = Z_1/Z_2 = \tan(\Theta)\) for \(\Theta \neq \pm\pi/2\)
Use the transformation method to find \(f_V(v)\) from the uniform distribution on \(\Theta\)

Note: The set where \(Z_2 = 0\) has probability zero and can be ignored.

(b) Verify that the Cauchy PDF integrates to 1. Explain why the MGF \(\mathbb{E}[e^{tV}]\) does not exist for any \(t \neq 0\).

Computational Verification Tips:

After computing the Jacobian by hand for the polar transformation \((z_1, z_2) = (r\cos\theta, r\sin\theta)\), you can check your result using SymPy’s Matrix([...]).jacobian([r, theta]) — it should give \(|J| = r\)
Verify that the Cauchy PDF integrates to 1 using integrate(1/(pi*(1+v**2)), (v, -oo, oo))
To construct samples: generate independent \(Z_1, Z_2 \sim N(0,1)\) and compute \(V = Z_1/Z_2\). Do not filter samples for correctness—the probability of \(Z_2\) being exactly zero is zero
Visualization challenge: The Cauchy distribution has such heavy tails that histograms are dominated by extreme values. For readability, clip the histogram x-axis to a central range (e.g., [-10, 10]), but keep all samples and use full quantile comparisons to verify tail behavior
Compare quantiles at extreme probabilities (1%, 5%, 95%, 99%) to verify the heavy tails match theory
The Cauchy has no mean or variance — sample moments will be unstable and meaningless

Problem 8: Student’s t from Normal and Chi-Square 

Let \(Z \sim \mathcal{N}(0, 1)\) and \(U \sim \chi^2(\nu)\) be independent. Define:

\[T = \frac{Z}{\sqrt{U/\nu}}\]

(a) Prove that \(T\) has Student’s \(t\) distribution with \(\nu\) degrees of freedom:

\[f_T(t) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi} \, \Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{t^2}{\nu}\right)^{-(\nu+1)/2}, \quad t \in \mathbb{R}\]

Suggested approach: Use conditioning. Note that \(T \mid U = u \sim \mathcal{N}(0, \nu/u)\). Then:

\[f_T(t) = \int_0^\infty f_{T|U}(t \mid u) \cdot f_U(u) \, du\]

Substitute the Normal and Chi-square densities, then evaluate the integral using the Gamma function identity.

(b) Verify that as \(\nu \to \infty\), the \(t_\nu\) distribution approaches \(\mathcal{N}(0, 1)\). Based on your KS statistic plot, propose a data-driven threshold for \(\nu\) at which the normal approximation becomes adequate (e.g., KS statistic drops below 0.02).

Computational Verification Tips:

Construct \(T = Z / \sqrt{U/\nu}\) where \(Z \sim N(0,1)\) and \(U \sim \chi^2(\nu)\) are independent
Use stats.chi2.rvs(df=nu, ...) for chi-squared samples
Compare against stats.t.rvs(df=nu, ...) for direct t-distribution samples
Theoretical variance: \(\text{Var}(T) = \nu/(\nu-2)\) for \(\nu > 2\); undefined for \(\nu \leq 2\)
For part (b): Create a two-panel figure: (1) t-distribution PDFs for \(\nu \in \{1, 3, 5, 10, 30\}\) overlaid with N(0,1), and (2) KS statistic vs. \(\nu\) for \(\nu \in \{3, 5, 10, 15, 20, 25, 30, 40, 50\}\)
A common rule of thumb states \(t_{30} \approx N(0,1)\) — does your analysis support this?

Problem 9: F Distribution from Chi-Squares 

Let \(U \sim \chi^2(\nu_1)\) and \(V \sim \chi^2(\nu_2)\) be independent. Define:

\[F = \frac{U/\nu_1}{V/\nu_2}\]

(a) Prove that \(F\) has the \(F(\nu_1, \nu_2)\) distribution:

\[f_F(x) = \frac{1}{B(\nu_1/2, \nu_2/2)} \left(\frac{\nu_1}{\nu_2}\right)^{\nu_1/2} x^{\nu_1/2 - 1} \left(1 + \frac{\nu_1}{\nu_2}x\right)^{-(\nu_1+\nu_2)/2}, \quad x > 0\]

where \(B(a, b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}\) is the Beta function.

Hint: Use the Gamma representation: \(\chi^2(\nu) = \text{Gamma}(\nu/2, 1/2)\). Transform \((U, V) \mapsto (F, V)\) and integrate out \(V\).

(b) Show that if \(T \sim t(\nu)\), then \(T^2 \sim F(1, \nu)\). This connects the \(t\)-test to the \(F\)-test.

Computational Verification Tips:

Construct \(F = (U/\nu_1) / (V/\nu_2)\) where \(U \sim \chi^2(\nu_1)\) and \(V \sim \chi^2(\nu_2)\) are independent
Compare against stats.f.rvs(dfn=nu1, dfd=nu2, ...) for direct F-distribution samples
Theoretical mean: \(E[F] = \nu_2/(\nu_2 - 2)\) for \(\nu_2 > 2\) (note: depends only on denominator df)
The F distribution has support \([0, \infty)\) and is right-skewed — set x-axis limits appropriately
For part (b): Generate \(T \sim t(\nu)\) samples, square them, and compare to \(F(1, \nu)\) samples
This demonstrates why a two-sided t-test at level \(\alpha\) is equivalent to an F-test at level \(\alpha\): if \(|T| > t_{\alpha/2, \nu}\), then \(T^2 > F_{\alpha, 1, \nu}\)
Create two visualizations: one for the general F construction, one for the \(T^2 \sim F(1, \nu)\) relationship

Part III: Computational Foundations 

This section tests your understanding of the fundamental computational concepts from Chapter 1: the Central Limit Theorem, convergence in distribution, and proper use of Python’s random generation ecosystem.

Problem 10: The Central Limit Theorem in Action 

The Central Limit Theorem (CLT) states that for iid random variables \(X_1, \ldots, X_n\) with mean \(\mu\) and variance \(\sigma^2 < \infty\):

\[\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1) \quad \text{as } n \to \infty\]

This problem explores the CLT empirically across distributions with very different shapes.

(a) CLT Universality: For each of the following distributions, generate 5,000 samples of size \(n\) for \(n \in \{1, 2, 5, 10, 30, 100\}\). For each sample, compute the standardized sample mean:

\[Z_n = \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}}\]

where \(\mu\) and \(\sigma\) are the theoretical mean and standard deviation.

Test with these distributions (use the theoretical moments given):

Exponential(1): \(\mu = 1\), \(\sigma = 1\) (right-skewed)
Uniform(0, 1): \(\mu = 0.5\), \(\sigma = 1/\sqrt{12}\) (symmetric, bounded)
Bernoulli(0.1): \(\mu = 0.1\), \(\sigma = \sqrt{0.1 \times 0.9}\) (discrete, highly skewed)
Beta(0.5, 0.5): \(\mu = 0.5\), \(\sigma = 0.5/\sqrt{2}\) (U-shaped)

Create a \(4 \times 6\) grid of histograms showing the distribution of \(Z_n\) for each (distribution, n) combination, with the \(N(0,1)\) PDF overlaid. Title each panel with the distribution name and sample size.

(b) Quantifying Convergence with the Kolmogorov-Smirnov Test: The Kolmogorov-Smirnov (KS) statistic measures the maximum vertical distance between two cumulative distribution functions:

\[D_n = \sup_x |F_n(x) - F(x)|\]

where \(F_n(x)\) is the empirical CDF and \(F(x)\) is the reference CDF. Smaller values indicate better agreement. The one-sample KS test compares a sample against a theoretical distribution; stats.kstest(samples, 'norm') returns both the KS statistic and a p-value testing the null hypothesis that the sample came from \(N(0,1)\).

For each distribution from part (a), compute the KS statistic between your standardized means and \(N(0,1)\) for each value of \(n\). Create a single plot showing KS statistic vs. \(n\) (use log scale for n) with one line per distribution. Which distribution converges fastest? Slowest? Explain why in terms of the source distribution’s shape (symmetry, boundedness, skewness).

(c) Studentized Statistics: In practice, we rarely know the true \(\mu\) and \(\sigma\). The studentized version of the CLT uses sample estimates:

\[T_n = \frac{\bar{X}_n - \mu}{S_n / \sqrt{n}}\]

where \(S_n\) is the sample standard deviation. For normal populations, \(T_n \sim t(n-1)\) exactly. For non-normal populations, \(T_n \xrightarrow{d} N(0,1)\) as \(n \to \infty\), but convergence may be slower.

Repeat part (a) using the studentized statistic \(T_n\) instead of \(Z_n\) (replace \(\sigma\) with \(S_n\) computed from each sample). Create another \(4 \times 6\) grid and compare to your results from part (a). For which distributions and sample sizes does using estimated variance make a noticeable difference?

Computational Verification Tips:

Use np.random.default_rng(seed) for reproducibility
For each (distribution, n) combination, generate a matrix of shape (n_simulations, n) and compute row means using np.mean(..., axis=1)
For part (a), standardize using theoretical moments
For part (c), compute sample standard deviations with np.std(..., axis=1, ddof=1) (use ddof=1 for the unbiased estimator)
Use stats.kstest(samples, 'norm') to get the KS statistic—the function returns a named tuple with .statistic and .pvalue attributes

Problem 11: Reproducibility and Parallel Streams 

Proper random number generation is critical for reproducible research. This problem tests your understanding of seeds, generator state, and parallel streams.

(a) Seed Sensitivity: Consider estimating \(E[X^2]\) where \(X \sim N(0, 1)\) (true value = 1) using \(n = 100\) samples.

Run this estimation with 10 different seeds (0 through 9)
Report the 10 estimates and their sample standard deviation
Now repeat with \(n = 10000\) samples — how does seed sensitivity change?
Explain the relationship between sample size and seed sensitivity

(b) Generator State: Demonstrate that you understand generator state by completing this task:

Create a generator with seed 42
Draw 100 uniform samples and compute their mean (call it mean_a)
Save the generator state using .bit_generator.state
Draw 100 more uniform samples and compute their mean (call it mean_b)
Restore the saved state
Draw 100 uniform samples and verify their mean equals mean_b

Report all three means and confirm the state restoration worked.

(c) Parallel Streams: When running simulations in parallel, each worker needs an independent random stream. Using SeedSequence.spawn():

Create a parent SeedSequence with entropy 42
Spawn 4 child sequences
Create a generator from each child
From each generator, draw 10000 samples from \(N(0, 1)\) and compute the mean
Verify the streams are independent by computing pairwise correlations between the raw samples (should be near 0)

Report the 4 means and the 6 pairwise correlations.

(d) The Birthday Problem Revisited: Use Monte Carlo simulation to estimate the probability that in a room of \(k\) people, at least two share a birthday (assume 365 equally likely birthdays).

Write a function birthday_collision(k, rng) that simulates one room and returns True if there’s a collision
Estimate the probability for \(k \in \{10, 20, 23, 30, 40, 50\}\) using 100,000 simulations each
Compare to the exact formula: \(P(\text{collision}) = 1 - \frac{365!}{365^k (365-k)!}\)
Your estimates should be within 0.01 of the exact values

Computational Verification Tips:

Use rng.bit_generator.state to save/restore state (it’s a dictionary)
For SeedSequence, use from numpy.random import SeedSequence, default_rng
For parallel streams: children = SeedSequence(42).spawn(4), then rng_i = default_rng(children[i])
For birthday collisions, rng.integers(0, 365, size=k) gives k random birthdays; check for duplicates with len(set(birthdays)) < k
The exact birthday formula can overflow; use scipy.special.perm(365, k) or compute in log space