Homework 1: Distributional Relationships and Computational Foundations
- Due:
[TBD]
- Submission:
Upload your written derivations (PDF) and Python verification script (.py or .ipynb) to Gradescope
Overview and Learning Objectives
Probability distributions don’t exist in isolation—they form an interconnected family linked through sums, limits, transformations, and special cases. Understanding these relationships is essential for:
Simulation: Generating samples from complex distributions using simpler ones
Inference: Deriving sampling distributions of test statistics
Model selection: Recognizing when a simpler distribution is appropriate
Derivation: Proving properties of one distribution from another
This assignment develops your ability to prove distributional relationships rigorously and verify your work computationally. You’ll use moment generating functions (MGFs) where they exist and transformation techniques where they don’t—learning when each tool is appropriate.
Learning Outcomes Addressed
LO 1: Apply simulation techniques, including transformation methods for random variable generation
LO 2: Compare mathematical frameworks for probabilistic reasoning and inference
LO 3: Implement computational methods for verifying theoretical results
Assignment Structure
The assignment has three parts:
Part I: MGF and Analytic Derivations (Problems 1–6): Use MGFs where they exist, plus related analytic techniques (sums, transforms, mixtures) to prove distributional identities
Part II: Transformation Methods (Problems 7–9): Use Jacobian transformations for distributions lacking MGFs (Cauchy, t, F)
Part III: Computational Foundations (Problems 10–11): Empirically verify the Central Limit Theorem and demonstrate proper use of random number generation
Each problem in Parts I and II requires two components:
Mathematical Derivation (handwritten or typeset): Show all steps, define notation, state assumptions, and justify each equality. This is the primary assessment—your derivation must be complete and correct.
Computational Verification (Python): Use the visualization utilities provided to verify your algebra and explore the result numerically. This catches errors and builds intuition.
Part III problems are primarily computational—you’ll implement simulations and create visualizations that demonstrate key theoretical concepts from Chapter 1.
The computational component is not a shortcut—it verifies work you’ve already done by hand. Start each problem with pencil and paper before touching the keyboard.
Mathematical Preliminaries
This section provides reference material. You may cite these results without re-deriving them.
Moment Generating Functions
For a random variable \(X\), the moment generating function (MGF) is defined as:
for all \(t\) in an open interval containing zero where the expectation exists.
Key Property (Independent Sums): If \(X_1, \ldots, X_n\) are independent and each \(M_{X_i}(t)\) exists in a neighborhood of zero, then:
Uniqueness: Under standard regularity conditions, the MGF uniquely determines the distribution. If two random variables have the same MGF in a neighborhood of zero, they have the same distribution.
Reference: Common MGFs
The following MGFs may be used without derivation:
Distribution |
PMF/PDF |
MGF \(M_X(t)\) |
|---|---|---|
Geometric(\(p\)) |
\(P(X=k) = p(1-p)^{k-1}\), \(k=1,2,\ldots\) |
\(\dfrac{pe^t}{1-(1-p)e^t}\), \(t < -\ln(1-p)\) |
NegBinom(\(r, p\)) |
\(P(Y=k) = \binom{k-1}{r-1}p^r(1-p)^{k-r}\), \(k \geq r\) |
\(\left(\dfrac{pe^t}{1-(1-p)e^t}\right)^r\) |
Poisson(\(\lambda\)) |
\(P(X=k) = e^{-\lambda}\dfrac{\lambda^k}{k!}\), \(k=0,1,\ldots\) |
\(\exp(\lambda(e^t - 1))\) |
Binomial(\(n,p\)) |
\(P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}\), \(k=0,\ldots,n\) |
\((1 - p + pe^t)^n\) |
Normal(\(\mu, \sigma^2\)) |
\(f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\) |
\(\exp\left(\mu t + \frac{\sigma^2 t^2}{2}\right)\) |
\(\chi^2(\nu)\) |
\(f(x) = \frac{x^{\nu/2-1}e^{-x/2}}{2^{\nu/2}\Gamma(\nu/2)}\), \(x > 0\) |
\((1 - 2t)^{-\nu/2}\), \(t < 1/2\) |
Important: The Cauchy, Student’s \(t\), and \(F\) distributions do not have MGFs (the expectation diverges). For these, use transformation/Jacobian methods.
Distribution |
Support |
|
|---|---|---|
Cauchy (standard) |
\(f(x) = \dfrac{1}{\pi(1 + x^2)}\) |
\(x \in \mathbb{R}\) |
Student’s \(t(\nu)\) |
\(f(x) = \dfrac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\,\Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{x^2}{\nu}\right)^{-(\nu+1)/2}\) |
\(x \in \mathbb{R}\) |
\(F(\nu_1, \nu_2)\) |
\(f(x) = \dfrac{1}{B(\nu_1/2, \nu_2/2)} \left(\frac{\nu_1}{\nu_2}\right)^{\nu_1/2} x^{\nu_1/2-1} \left(1 + \frac{\nu_1 x}{\nu_2}\right)^{-(\nu_1+\nu_2)/2}\) |
\(x > 0\) |
These distributions arise from ratios and other transformations of normal and chi-square random variables, which is why transformation methods are required.
When MGFs Fail: Transformation Methods
When an MGF doesn’t exist, we derive distributions using the transformation (Jacobian) method:
For \(Y = g(X)\) where \(g\) is monotonic with inverse \(g^{-1}\):
For multivariate transformations \((Y_1, Y_2) = g(X_1, X_2)\):
where \(|J|\) is the absolute value of the Jacobian determinant.
Computational Setup
Before starting, set up your Python environment. The code below provides tools for verifying your hand derivations—SymPy can check symbolic algebra, and numerical sampling confirms distributional results.
"""
Homework 1: Distributional Relationships
Computational Verification Template
Instructions:
1. Complete the hand derivation FIRST (on paper or iPad)
2. Use this code to verify your algebraic work
3. Include output and figures in your submission
"""
import numpy as np
import sympy as sp
from sympy import (symbols, exp, log, sqrt, pi, simplify, expand,
integrate, diff, factorial, binomial, gamma, oo,
Rational, latex, init_printing)
from sympy.stats import Normal, ChiSquared, density, E
from scipy import stats
import matplotlib.pyplot as plt
init_printing() # Pretty symbolic output
# Common symbols (define once, reuse throughout)
t, p, r, k, n, m = symbols('t p r k n m', positive=True, real=True)
lam, lam1, lam2 = symbols('lambda lambda_1 lambda_2', positive=True)
mu, sigma = symbols('mu sigma', real=True)
nu, nu1, nu2 = symbols('nu nu_1 nu_2', positive=True, integer=True)
z, u, x = symbols('z u x', real=True)
Visual Verification Tools
A key component of this assignment is visually verifying your theoretical results. For each distributional relationship, you will:
Sample directly from the target distribution (e.g., Negative Binomial)
Construct samples by sampling from the building-block distributions and applying the appropriate transformation (e.g., sum of Geometrics)
Compare visually using histograms, kernel density estimates, and Q-Q plots
If your derivation is correct, the two sampling methods should produce indistinguishable distributions.
Visualization Utilities
Use the following helper functions throughout the assignment:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
def compare_distributions(samples_constructed, samples_direct,
title, label_constructed, label_direct,
discrete=False, theoretical_pmf=None,
theoretical_pdf=None, x_range=None):
"""
Visually compare two sets of samples that should follow the same distribution.
Parameters
----------
samples_constructed : array-like
Samples obtained by transforming/combining simpler distributions
samples_direct : array-like
Samples drawn directly from the target distribution
title : str
Plot title
label_constructed : str
Label for constructed samples (e.g., "Sum of Geometrics")
label_direct : str
Label for direct samples (e.g., "NegBinom(r,p)")
discrete : bool
If True, use bar plots for PMF comparison
theoretical_pmf : callable, optional
Function computing theoretical PMF at integer values
theoretical_pdf : callable, optional
Function computing theoretical PDF
x_range : tuple, optional
(min, max) for x-axis; auto-determined if None
"""
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
fig.suptitle(title, fontsize=14, fontweight='bold')
# Panel 1: Histogram/PMF comparison
ax1 = axes[0]
if discrete:
# For discrete distributions, use aligned bar plots
all_vals = np.concatenate([samples_constructed, samples_direct])
bins = np.arange(all_vals.min() - 0.5, all_vals.max() + 1.5, 1)
ax1.hist(samples_constructed, bins=bins, density=True, alpha=0.5,
label=label_constructed, color='steelblue', edgecolor='white')
ax1.hist(samples_direct, bins=bins, density=True, alpha=0.5,
label=label_direct, color='coral', edgecolor='white')
if theoretical_pmf is not None:
x_vals = np.arange(int(all_vals.min()), int(all_vals.max()) + 1)
ax1.plot(x_vals, [theoretical_pmf(k) for k in x_vals], 'k-',
lw=2, label='Theoretical PMF')
else:
# For continuous distributions, use KDE
ax1.hist(samples_constructed, bins=50, density=True, alpha=0.5,
label=label_constructed, color='steelblue', edgecolor='white')
ax1.hist(samples_direct, bins=50, density=True, alpha=0.5,
label=label_direct, color='coral', edgecolor='white')
if theoretical_pdf is not None:
if x_range is None:
x_range = (min(samples_constructed.min(), samples_direct.min()),
max(samples_constructed.max(), samples_direct.max()))
x_vals = np.linspace(x_range[0], x_range[1], 200)
ax1.plot(x_vals, theoretical_pdf(x_vals), 'k-', lw=2,
label='Theoretical PDF')
ax1.set_xlabel('Value')
ax1.set_ylabel('Density')
ax1.set_title('Distribution Comparison')
ax1.legend(fontsize=8)
# Panel 2: Q-Q plot (constructed vs direct)
ax2 = axes[1]
quantiles = np.linspace(0.01, 0.99, 100)
q_constructed = np.quantile(samples_constructed, quantiles)
q_direct = np.quantile(samples_direct, quantiles)
ax2.scatter(q_direct, q_constructed, alpha=0.6, s=20, color='purple')
# Add reference line
lims = [min(q_direct.min(), q_constructed.min()),
max(q_direct.max(), q_constructed.max())]
ax2.plot(lims, lims, 'k--', lw=1.5, label='y = x')
ax2.set_xlabel(f'Quantiles ({label_direct})')
ax2.set_ylabel(f'Quantiles ({label_constructed})')
ax2.set_title('Q-Q Plot')
ax2.legend()
ax2.set_aspect('equal', adjustable='box')
# Panel 3: Empirical CDF comparison
ax3 = axes[2]
# Sort samples for ECDF
x_con = np.sort(samples_constructed)
y_con = np.arange(1, len(x_con) + 1) / len(x_con)
x_dir = np.sort(samples_direct)
y_dir = np.arange(1, len(x_dir) + 1) / len(x_dir)
ax3.step(x_con, y_con, where='post', label=label_constructed,
color='steelblue', alpha=0.7)
ax3.step(x_dir, y_dir, where='post', label=label_direct,
color='coral', alpha=0.7)
ax3.set_xlabel('Value')
ax3.set_ylabel('Cumulative Probability')
ax3.set_title('Empirical CDF Comparison')
ax3.legend(fontsize=8)
plt.tight_layout()
plt.show()
# Print summary statistics
print(f"\nSummary Statistics:")
print(f" {label_constructed:25s} - Mean: {np.mean(samples_constructed):8.4f}, "
f"Var: {np.var(samples_constructed):8.4f}, "
f"Median: {np.median(samples_constructed):8.4f}")
print(f" {label_direct:25s} - Mean: {np.mean(samples_direct):8.4f}, "
f"Var: {np.var(samples_direct):8.4f}, "
f"Median: {np.median(samples_direct):8.4f}")
# Kolmogorov-Smirnov test
ks_stat, ks_pval = stats.ks_2samp(samples_constructed, samples_direct)
print(f"\n Two-sample KS test: statistic = {ks_stat:.4f}, p-value = {ks_pval:.4f}")
if ks_pval > 0.05:
print(" → No significant evidence of difference (p > 0.05)")
else:
print(" → Evidence of difference detected (p ≤ 0.05)")
print(" Note: KS p-values are approximate for discrete distributions")
def qq_plot_theoretical(samples, dist, title, dist_name):
"""
Q-Q plot comparing samples against a theoretical distribution.
Parameters
----------
samples : array-like
Empirical samples
dist : scipy.stats distribution
Frozen scipy distribution (e.g., stats.norm(0, 1))
title : str
Plot title
dist_name : str
Name of theoretical distribution for labeling
"""
fig, ax = plt.subplots(figsize=(6, 6))
# Compute theoretical and sample quantiles
n = len(samples)
theoretical_quantiles = dist.ppf(np.linspace(1/(n+1), n/(n+1), n))
sample_quantiles = np.sort(samples)
ax.scatter(theoretical_quantiles, sample_quantiles, alpha=0.5, s=10)
# Reference line
lims = [min(theoretical_quantiles.min(), sample_quantiles.min()),
max(theoretical_quantiles.max(), sample_quantiles.max())]
ax.plot(lims, lims, 'r--', lw=2)
ax.set_xlabel(f'Theoretical Quantiles ({dist_name})')
ax.set_ylabel('Sample Quantiles')
ax.set_title(title)
ax.set_aspect('equal', adjustable='box')
plt.tight_layout()
plt.show()
Part I: MGF and Analytic Derivations
These problems use moment generating functions and related analytic techniques to prove distributional identities. For MGF problems, compute the MGF of the relevant quantity, then identify the resulting distribution by matching to known MGF forms.
Parameterization Conventions ⚠️
Pay careful attention to parameterization differences between mathematical conventions and SciPy:
Geometric: We use support \(\{1, 2, 3, \ldots\}\) (trials until first success). SciPy’s
stats.geom(p)matches this convention.Negative Binomial: SciPy’s
stats.nbinom(n=r, p=p)counts failures before r successes with support \(\{0, 1, 2, \ldots\}\). To get trials until r successes, add r to SciPy samples.Gamma: We write \(\text{Gamma}(r, \beta)\) with shape \(r\) and rate \(\beta\). SciPy uses scale = \(1/\beta\), so use
stats.gamma(a=r, scale=1/beta).Exponential: We write \(\text{Exp}(\lambda)\) with rate \(\lambda\). SciPy uses scale = \(1/\lambda\), so use
stats.expon(scale=1/lambda).
Problem 1: Geometric to Negative Binomial
Let \(X_1, \ldots, X_r\) be independent and identically distributed \(\text{Geometric}(p)\) random variables, where \(X_i\) counts the number of trials until the first success:
Let \(S_r = \sum_{i=1}^r X_i\) be the total number of trials until \(r\) successes.
(a) Using the MGF of the Geometric distribution and the multiplication property for independent sums, derive the MGF of \(S_r\):
(b) Conclude that \(S_r \sim \text{NegativeBinomial}(r, p)\) in the “trials until \(r\)-th success” parameterization, with PMF:
Explain the combinatorial interpretation of this PMF (why \(\binom{k-1}{r-1}\)?).
Computational Verification Tips:
Use
stats.geom.rvs(p=p_val, size=(n_samples, r_val))to generate a matrix of Geometric samples, then sum across rowsFor direct Negative Binomial samples, note that SciPy’s
nbinomuses the “failures before r successes” parameterization — you’ll need to add r to convert to “trials until r successes”After deriving the MGFs by hand, use SymPy to check that the product of r identical Geometric MGFs simplifies to the Negative Binomial MGF
Use
compare_distributions()withdiscrete=Trueto visualize the PMF comparison
Problem 2: Poisson Additivity
Let \(X_1, \ldots, X_m\) be independent Poisson random variables with possibly different rates \(\lambda_1, \ldots, \lambda_m\):
Let \(S = \sum_{i=1}^m X_i\).
(a) Prove that \(S \sim \text{Poisson}(\Lambda)\) where \(\Lambda = \sum_{i=1}^m \lambda_i\).
(b) The Poisson family is called closed under convolution (or “reproducing”). Explain in one sentence what this means and give a real-world example where this property is useful.
Computational Verification Tips:
Define a lambda function for the Poisson MGF:
M_pois = lambda lam: exp(lam * (exp(t) - 1))Multiply MGFs with different rates and use
simplify()to verify they combine as expectedGenerate samples from multiple Poisson distributions with different λ values and sum them
Compare against direct samples from
stats.poisson.rvs(mu=sum(lambda_vals), ...)For Poisson, mean equals variance — check this holds for both constructed and direct samples
Problem 3: Binomial Additivity
Let \(X_1, \ldots, X_m\) be independent Binomial random variables with the same success probability \(p\) but possibly different numbers of trials \(n_1, \ldots, n_m\):
Let \(S = \sum_{i=1}^m X_i\).
(a) Prove that \(S \sim \text{Binomial}(N, p)\) where \(N = \sum_{i=1}^m n_i\).
(b) What happens if the \(X_i\) have different success probabilities \(p_i\)? Is the sum still Binomial? Justify your answer (a counterexample or brief argument suffices).
Computational Verification Tips:
The Binomial MGF is \((1 - p + pe^t)^n\) — after your hand derivation, use SymPy to check the product simplifies correctly
Taking logarithms before comparing can simplify the algebra:
simplify(log(product) - log(combined))Generate Binomial samples with varying n values but the same p, then sum
For part (b), try sampling from Binomials with different p values and observe that the sum’s distribution doesn’t match any Binomial — plot the PMF to see the discrepancy
Problem 4: Normal Additivity
Let \(X_1, \ldots, X_n\) be independent and identically distributed \(\mathcal{N}(\mu, \sigma^2)\) random variables. Let \(S_n = \sum_{i=1}^n X_i\).
(a) Prove that \(S_n \sim \mathcal{N}(n\mu, n\sigma^2)\).
(b) As a corollary, prove that the sample mean \(\bar{X} = S_n/n\) follows \(\mathcal{N}(\mu, \sigma^2/n)\). This is the exact (finite-sample) distribution, not an asymptotic approximation.
Computational Verification Tips:
The Normal MGF is \(\exp(\mu t + \sigma^2 t^2/2)\) — after your hand derivation, use SymPy to check that \([M_X(t)]^n\) simplifies to \(M_{S_n}(t)\)
Generate an (n_samples × n) matrix of Normal samples, then use
np.sum(..., axis=1)for sums andnp.mean(..., axis=1)for meansCreate two visualizations: one for the sum \(S_n \sim N(n\mu, n\sigma^2)\) and one for the mean \(\bar{X} \sim N(\mu, \sigma^2/n)\)
The sample mean distribution is exact (not asymptotic) — verify that even small n gives perfect agreement
Problem 5: Chi-Square from Normal Squares
This problem establishes one of the most important constructions in statistical inference.
(a) Let \(Z \sim \mathcal{N}(0, 1)\) and define \(Y = Z^2\). Derive from first principles that:
and conclude that \(Y \sim \chi^2(1)\).
Hint: Write the expectation as an integral over the standard normal density, combine exponents, and use the Gaussian integral identity:
(b) Now let \(Z_1, \ldots, Z_n\) be iid \(\mathcal{N}(0, 1)\) and define \(Q = \sum_{i=1}^n Z_i^2\). Prove that \(Q \sim \chi^2(n)\).
(c) Explain why this result implies that for a random sample \(X_1, \ldots, X_n\) from \(\mathcal{N}(\mu, \sigma^2)\):
where \(S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2\) is the sample variance. (You may state without proof that \(\sum (X_i - \bar{X})^2/\sigma^2\) involves \(n-1\) independent squared standard normals.)
Computational Verification Tips:
For part (a), after deriving the MGF by hand, verify your integral computation using SymPy:
integrate(exp(t*z**2) * exp(-z**2/2) / sqrt(2*pi), (z, -oo, oo), conds='none')should simplify to \((1-2t)^{-1/2}\)For part (b), square standard normal samples and sum across columns, then compare to
stats.chi2.rvs(df=n, ...)The \(\chi^2\) distribution is right-skewed with support \([0, \infty)\) — set appropriate x-axis limits for visualization
Theoretical moments: \(E[\chi^2_n] = n\), \(\text{Var}(\chi^2_n) = 2n\)
Problem 6: The Gamma-Poisson-Negative Binomial Triangle
This problem explores the deep connections between three distributions that arise naturally in count data modeling. The relationships have both theoretical elegance and practical importance—the Negative Binomial emerges as an “overdispersed Poisson” when the Poisson rate is itself random.
Background: In many applications, count data exhibits more variability than a Poisson model predicts. One explanation: the rate parameter \(\lambda\) varies across observations. If we model this heterogeneity by treating \(\lambda\) as a Gamma-distributed random variable, the resulting marginal distribution is Negative Binomial.
(a) Poisson-Gamma Mixture: Let \(X \mid \Lambda = \lambda \sim \text{Poisson}(\lambda)\) and let \(\Lambda \sim \text{Gamma}(r, \beta)\) with PDF:
Show that the marginal distribution of \(X\) (integrating out \(\Lambda\)) is Negative Binomial. Specifically, prove:
Hint: Write \(P(X = k) = \int_0^\infty P(X = k \mid \Lambda = \lambda) f_\Lambda(\lambda) \, d\lambda\), substitute the Poisson PMF and Gamma PDF, then recognize the resulting integral as a Gamma function.
(b) Identifying the Parameterization: The result in (a) gives a Negative Binomial in the “number of failures before \(r\) successes” parameterization. Show that if we define \(p = \frac{\beta}{1 + \beta}\), then:
Verify that \(\mathbb{E}[X] = \frac{r(1-p)}{p} = \frac{r}{\beta}\) and \(\text{Var}(X) = \frac{r(1-p)}{p^2} = \frac{r(1+\beta)}{\beta^2}\).
(c) Overdispersion: For a Poisson distribution, \(\text{Var}(X) = \mathbb{E}[X]\). Show that for the Negative Binomial from part (b):
This demonstrates overdispersion: \(\text{Var}(X) > \mathbb{E}[X]\). Explain intuitively why mixing over the rate parameter increases variability.
(d) Poisson Process Connection: In a Poisson process with rate \(\mu\), let \(N(t)\) be the number of events by time \(t\), and let \(T_r\) be the time of the \(r\)-th event. We have:
\(N(t) \sim \text{Poisson}(\mu t)\)
\(T_r \sim \text{Gamma}(r, \mu)\) (shape \(r\), rate \(\mu\))
Prove the fundamental equivalence:
Hint: The event “at least \(r\) arrivals by time \(t\)” is equivalent to “the \(r\)-th arrival occurred by time \(t\).”
(e) Exponential-Gamma Additivity: As a parallel to Problem 1 (Geometric \(\to\) Negative Binomial), prove using MGFs that if \(X_1, \ldots, X_r\) are iid \(\text{Exponential}(\lambda)\), then:
This shows that Gamma is to Exponential as Negative Binomial is to Geometric—the waiting time for \(r\) events in continuous vs. discrete time.
Computational Verification Tips:
Part (a/b): To construct the Poisson-Gamma mixture, first sample \(\Lambda \sim \text{Gamma}(r, \beta)\) using
stats.gamma.rvs(a=r, scale=1/beta, ...), then sample \(X \mid \Lambda\) fromstats.poisson.rvs(mu=lambda_sample)for each \(\Lambda\) valueSciPy’s Negative Binomial uses the “failures before r successes” parameterization with \(p = \beta/(1+\beta)\)
Part (c): Compute the theoretical variance using both the direct formula and the overdispersion formula \(\text{Var} = E[X] + E[X]^2/r\) to verify they match
Part (d): Compare
1 - stats.poisson.cdf(r-1, mu=mu*t)withstats.gamma.cdf(t, a=r, scale=1/mu)— they should be equal to machine precisionPart (e): Sum Exponential samples and compare to Gamma; note SciPy’s
exponuses scale = 1/rate, sostats.expon.rvs(scale=1/lambda, ...)Create two visualizations: one for the Poisson-Gamma mixture (discrete) and one for Exponential-to-Gamma (continuous)
Part II: Transformation Methods
These distributions lack MGFs, requiring transformation (Jacobian) techniques. Derive the PDFs by hand using change-of-variables, then use numerical sampling to verify your results.
Problem 7: Ratio of Normals is Cauchy
Let \(Z_1, Z_2\) be independent \(\mathcal{N}(0, 1)\) random variables. Define \(V = Z_1 / Z_2\).
(a) Prove that \(V\) has the standard Cauchy distribution:
Suggested approach: Use polar coordinates. Define \(R = \sqrt{Z_1^2 + Z_2^2}\) and \(\Theta\) as the angle, where \(\Theta \in (-\pi, \pi]\). Show that:
Under the isotropic bivariate standard normal, \(\Theta\) is uniformly distributed (the angle is independent of radius for spherically symmetric distributions)
\(V = Z_1/Z_2 = \tan(\Theta)\) for \(\Theta \neq \pm\pi/2\)
Use the transformation method to find \(f_V(v)\) from the uniform distribution on \(\Theta\)
Note: The set where \(Z_2 = 0\) has probability zero and can be ignored.
(b) Verify that the Cauchy PDF integrates to 1. Explain why the MGF \(\mathbb{E}[e^{tV}]\) does not exist for any \(t \neq 0\).
Computational Verification Tips:
After computing the Jacobian by hand for the polar transformation \((z_1, z_2) = (r\cos\theta, r\sin\theta)\), you can check your result using SymPy’s
Matrix([...]).jacobian([r, theta])— it should give \(|J| = r\)Verify that the Cauchy PDF integrates to 1 using
integrate(1/(pi*(1+v**2)), (v, -oo, oo))To construct samples: generate independent \(Z_1, Z_2 \sim N(0,1)\) and compute \(V = Z_1/Z_2\). Do not filter samples for correctness—the probability of \(Z_2\) being exactly zero is zero
Visualization challenge: The Cauchy distribution has such heavy tails that histograms are dominated by extreme values. For readability, clip the histogram x-axis to a central range (e.g.,
[-10, 10]), but keep all samples and use full quantile comparisons to verify tail behaviorCompare quantiles at extreme probabilities (1%, 5%, 95%, 99%) to verify the heavy tails match theory
The Cauchy has no mean or variance — sample moments will be unstable and meaningless
Problem 8: Student’s t from Normal and Chi-Square
Let \(Z \sim \mathcal{N}(0, 1)\) and \(U \sim \chi^2(\nu)\) be independent. Define:
(a) Prove that \(T\) has Student’s \(t\) distribution with \(\nu\) degrees of freedom:
Suggested approach: Use conditioning. Note that \(T \mid U = u \sim \mathcal{N}(0, \nu/u)\). Then:
Substitute the Normal and Chi-square densities, then evaluate the integral using the Gamma function identity.
(b) Verify that as \(\nu \to \infty\), the \(t_\nu\) distribution approaches \(\mathcal{N}(0, 1)\). Based on your KS statistic plot, propose a data-driven threshold for \(\nu\) at which the normal approximation becomes adequate (e.g., KS statistic drops below 0.02).
Computational Verification Tips:
Construct \(T = Z / \sqrt{U/\nu}\) where \(Z \sim N(0,1)\) and \(U \sim \chi^2(\nu)\) are independent
Use
stats.chi2.rvs(df=nu, ...)for chi-squared samplesCompare against
stats.t.rvs(df=nu, ...)for direct t-distribution samplesTheoretical variance: \(\text{Var}(T) = \nu/(\nu-2)\) for \(\nu > 2\); undefined for \(\nu \leq 2\)
For part (b): Create a two-panel figure: (1) t-distribution PDFs for \(\nu \in \{1, 3, 5, 10, 30\}\) overlaid with N(0,1), and (2) KS statistic vs. \(\nu\) for \(\nu \in \{3, 5, 10, 15, 20, 25, 30, 40, 50\}\)
A common rule of thumb states \(t_{30} \approx N(0,1)\) — does your analysis support this?
Problem 9: F Distribution from Chi-Squares
Let \(U \sim \chi^2(\nu_1)\) and \(V \sim \chi^2(\nu_2)\) be independent. Define:
(a) Prove that \(F\) has the \(F(\nu_1, \nu_2)\) distribution:
where \(B(a, b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}\) is the Beta function.
Hint: Use the Gamma representation: \(\chi^2(\nu) = \text{Gamma}(\nu/2, 1/2)\). Transform \((U, V) \mapsto (F, V)\) and integrate out \(V\).
(b) Show that if \(T \sim t(\nu)\), then \(T^2 \sim F(1, \nu)\). This connects the \(t\)-test to the \(F\)-test.
Computational Verification Tips:
Construct \(F = (U/\nu_1) / (V/\nu_2)\) where \(U \sim \chi^2(\nu_1)\) and \(V \sim \chi^2(\nu_2)\) are independent
Compare against
stats.f.rvs(dfn=nu1, dfd=nu2, ...)for direct F-distribution samplesTheoretical mean: \(E[F] = \nu_2/(\nu_2 - 2)\) for \(\nu_2 > 2\) (note: depends only on denominator df)
The F distribution has support \([0, \infty)\) and is right-skewed — set x-axis limits appropriately
For part (b): Generate \(T \sim t(\nu)\) samples, square them, and compare to \(F(1, \nu)\) samples
This demonstrates why a two-sided t-test at level \(\alpha\) is equivalent to an F-test at level \(\alpha\): if \(|T| > t_{\alpha/2, \nu}\), then \(T^2 > F_{\alpha, 1, \nu}\)
Create two visualizations: one for the general F construction, one for the \(T^2 \sim F(1, \nu)\) relationship
Part III: Computational Foundations
This section tests your understanding of the fundamental computational concepts from Chapter 1: the Central Limit Theorem, convergence in distribution, and proper use of Python’s random generation ecosystem.
Problem 10: The Central Limit Theorem in Action
The Central Limit Theorem (CLT) states that for iid random variables \(X_1, \ldots, X_n\) with mean \(\mu\) and variance \(\sigma^2 < \infty\):
This problem explores the CLT empirically across distributions with very different shapes.
(a) CLT Universality: For each of the following distributions, generate 5,000 samples of size \(n\) for \(n \in \{1, 2, 5, 10, 30, 100\}\). For each sample, compute the standardized sample mean:
where \(\mu\) and \(\sigma\) are the theoretical mean and standard deviation.
Test with these distributions (use the theoretical moments given):
Exponential(1): \(\mu = 1\), \(\sigma = 1\) (right-skewed)
Uniform(0, 1): \(\mu = 0.5\), \(\sigma = 1/\sqrt{12}\) (symmetric, bounded)
Bernoulli(0.1): \(\mu = 0.1\), \(\sigma = \sqrt{0.1 \times 0.9}\) (discrete, highly skewed)
Beta(0.5, 0.5): \(\mu = 0.5\), \(\sigma = 0.5/\sqrt{2}\) (U-shaped)
Create a \(4 \times 6\) grid of histograms showing the distribution of \(Z_n\) for each (distribution, n) combination, with the \(N(0,1)\) PDF overlaid. Title each panel with the distribution name and sample size.
(b) Quantifying Convergence with the Kolmogorov-Smirnov Test: The Kolmogorov-Smirnov (KS) statistic measures the maximum vertical distance between two cumulative distribution functions:
where \(F_n(x)\) is the empirical CDF and \(F(x)\) is the reference CDF. Smaller values indicate better agreement. The one-sample KS test compares a sample against a theoretical distribution; stats.kstest(samples, 'norm') returns both the KS statistic and a p-value testing the null hypothesis that the sample came from \(N(0,1)\).
For each distribution from part (a), compute the KS statistic between your standardized means and \(N(0,1)\) for each value of \(n\). Create a single plot showing KS statistic vs. \(n\) (use log scale for n) with one line per distribution. Which distribution converges fastest? Slowest? Explain why in terms of the source distribution’s shape (symmetry, boundedness, skewness).
(c) Studentized Statistics: In practice, we rarely know the true \(\mu\) and \(\sigma\). The studentized version of the CLT uses sample estimates:
where \(S_n\) is the sample standard deviation. For normal populations, \(T_n \sim t(n-1)\) exactly. For non-normal populations, \(T_n \xrightarrow{d} N(0,1)\) as \(n \to \infty\), but convergence may be slower.
Repeat part (a) using the studentized statistic \(T_n\) instead of \(Z_n\) (replace \(\sigma\) with \(S_n\) computed from each sample). Create another \(4 \times 6\) grid and compare to your results from part (a). For which distributions and sample sizes does using estimated variance make a noticeable difference?
Computational Verification Tips:
Use
np.random.default_rng(seed)for reproducibilityFor each (distribution, n) combination, generate a matrix of shape
(n_simulations, n)and compute row means usingnp.mean(..., axis=1)For part (a), standardize using theoretical moments
For part (c), compute sample standard deviations with
np.std(..., axis=1, ddof=1)(useddof=1for the unbiased estimator)Use
stats.kstest(samples, 'norm')to get the KS statistic—the function returns a named tuple with.statisticand.pvalueattributes
Problem 11: Reproducibility and Parallel Streams
Proper random number generation is critical for reproducible research. This problem tests your understanding of seeds, generator state, and parallel streams.
(a) Seed Sensitivity: Consider estimating \(E[X^2]\) where \(X \sim N(0, 1)\) (true value = 1) using \(n = 100\) samples.
Run this estimation with 10 different seeds (0 through 9)
Report the 10 estimates and their sample standard deviation
Now repeat with \(n = 10000\) samples — how does seed sensitivity change?
Explain the relationship between sample size and seed sensitivity
(b) Generator State: Demonstrate that you understand generator state by completing this task:
Create a generator with seed 42
Draw 100 uniform samples and compute their mean (call it
mean_a)Save the generator state using
.bit_generator.stateDraw 100 more uniform samples and compute their mean (call it
mean_b)Restore the saved state
Draw 100 uniform samples and verify their mean equals
mean_b
Report all three means and confirm the state restoration worked.
(c) Parallel Streams: When running simulations in parallel, each worker needs an independent random stream. Using SeedSequence.spawn():
Create a parent
SeedSequencewith entropy 42Spawn 4 child sequences
Create a generator from each child
From each generator, draw 10000 samples from \(N(0, 1)\) and compute the mean
Verify the streams are independent by computing pairwise correlations between the raw samples (should be near 0)
Report the 4 means and the 6 pairwise correlations.
(d) The Birthday Problem Revisited: Use Monte Carlo simulation to estimate the probability that in a room of \(k\) people, at least two share a birthday (assume 365 equally likely birthdays).
Write a function
birthday_collision(k, rng)that simulates one room and returns True if there’s a collisionEstimate the probability for \(k \in \{10, 20, 23, 30, 40, 50\}\) using 100,000 simulations each
Compare to the exact formula: \(P(\text{collision}) = 1 - \frac{365!}{365^k (365-k)!}\)
Your estimates should be within 0.01 of the exact values
Computational Verification Tips:
Use
rng.bit_generator.stateto save/restore state (it’s a dictionary)For
SeedSequence, usefrom numpy.random import SeedSequence, default_rngFor parallel streams:
children = SeedSequence(42).spawn(4), thenrng_i = default_rng(children[i])For birthday collisions,
rng.integers(0, 365, size=k)gives k random birthdays; check for duplicates withlen(set(birthdays)) < kThe exact birthday formula can overflow; use
scipy.special.perm(365, k)or compute in log space