STAT 418 · Computational Data Science · runnable companion to the Chapter 2 webbook.
Monte Carlo simulation turns intractable problems into experiments we can run on a computer. When an integral, a probability, or a distribution has no closed form, we replace mathematics with sampling: the Law of Large Numbers guarantees that sample averages converge to expectations, and the Central Limit Theorem tells us how fast (the famous $O(n^{-1/2})$ rate). This notebook builds the full pipeline end-to-end — generate uniforms, transform them into any distribution, average to estimate, then accelerate with variance reduction — and grounds two stages in real data so you can run every cell without the repo.
By the end you can:
| § | Topic | Status | Real-data / worked example |
|---|---|---|---|
| 2.1 | Monte Carlo fundamentals | core | $\pi$, Gaussian integral, rare event, curse of dimensionality + Michelson speed-of-light bootstrap preview |
| 2.2 | Uniform random variates | optional | LCG period & the RANDU 15-plane disaster |
| 2.3 | Inverse-CDF method | core | Exponential (+KS test), discrete sampling + alias method |
| 2.4 | Transformation methods | optional | Box–Muller, multivariate normal via Cholesky |
| 2.5 | Rejection sampling | core | Beta(2.5, 6) envelope + free-throw Bayesian posterior |
| 2.6 | Variance reduction | core | importance sampling, control / antithetic / stratified |
| 2.7 | Summary & connections | core | formula table $\to$ Chapter 4 bootstrap |
Sections 2.2 and 2.4 are marked optional / self-study: they are the "plumbing" (how
rng.random()andrng.standard_normal()actually work). They are covered in full but can be skimmed on a first pass — the core estimation story is §2.1, §2.3, §2.5, §2.6.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
%matplotlib inline
np.random.seed(418) # reproducibility (STYLE.md contract)
plt.rcParams.update({
'figure.figsize': (7.2, 4.2), 'figure.dpi': 110,
'axes.grid': True, 'grid.alpha': 0.25,
'axes.spines.top': False, 'axes.spines.right': False,
'font.size': 11,
})
BLUE, RED, GREEN, ORANGE, PURPLE = '#4C72B0', '#C44E52', '#55A868', '#DD8452', '#8172B3'
# Course data bucket. Chapter 2 has no data of its own, so the two real-data
# touchpoints borrow datasets that live under other chapters' folders.
SUPA = ('https://pqyjaywwccbnqpwgeiuv.supabase.co/storage/v1/object/public/'
'STAT%20418%20Images/Data/')
def data_url(chapter, name):
return f'{SUPA}Chapter{chapter}/{name}'
print('environment ready · numpy', np.__version__, '· pandas', pd.__version__)
environment ready · numpy 1.26.4 · pandas 2.3.3
The single idea behind the whole chapter: any integral is an expectation, and any expectation is a sample average. For a density $f$ on $\mathcal X$,
$$ I = \int_{\mathcal X} h(x)\,f(x)\,dx = \mathbb E_f[h(X)] \;\approx\; \hat I_n = \frac1n\sum_{i=1}^n h(X_i), \qquad X_i \stackrel{\text{iid}}{\sim} f. $$The Law of Large Numbers makes $\hat I_n \to I$ almost surely; the Central Limit Theorem makes $\hat I_n \stackrel{\cdot}{\sim} \mathcal N(I,\,\sigma^2/n)$, so the standard error is $\widehat{\mathrm{SE}} = s/\sqrt n$ and an approximate 95% interval is $\hat I_n \pm 1.96\,\widehat{\mathrm{SE}}$. The error shrinks as $O(n^{-1/2})$: 100× the samples buys only 10× the accuracy — but that rate holds in every dimension, which is Monte Carlo's superpower.
$\pi$ is the area of the unit disk, so $\pi = 4\,\mathbb E[\mathbf 1_{X^2+Y^2\le 1}]$ for $(X,Y)\sim\mathrm{Uniform}([-1,1]^2)$. Drop $n$ random points in the square, count the fraction inside the circle, multiply by 4. Because the integrand is an indicator (0/1), the variance is the Bernoulli form $p(1-p)$ with $p=\pi/4$.
# Monte Carlo estimate of pi (seed 42 reproduces the webbook's 3.143080).
rng = np.random.default_rng(42)
n = 100_000
x = rng.uniform(-1, 1, n)
y = rng.uniform(-1, 1, n)
inside = (x**2 + y**2) <= 1.0
p_hat = inside.mean()
pi_hat = 4 * p_hat
se_pi = 4 * np.sqrt(p_hat * (1 - p_hat) / n) # Bernoulli SE, scaled by 4
ci = (pi_hat - 1.96 * se_pi, pi_hat + 1.96 * se_pi)
print(f'pi estimate : {pi_hat:.6f} (n = {n:,})')
print(f'true pi : {np.pi:.6f}')
print(f'error : {abs(pi_hat - np.pi):.6f}')
print(f'std error : {se_pi:.6f}')
print(f'95% CI : ({ci[0]:.6f}, {ci[1]:.6f}) -> contains pi: {ci[0] < np.pi < ci[1]}')
pi estimate : 3.143080 (n = 100,000) true pi : 3.141593 error : 0.001487 std error : 0.005190 95% CI : (3.132908, 3.153252) -> contains pi: True
# Figure: (left) the dartboard, (right) the noisy O(1/sqrt n) convergence.
fig, (axL, axR) = plt.subplots(1, 2, figsize=(11.5, 4.4))
sub = slice(0, 4000) # plot a subsample for clarity
axL.scatter(x[sub][inside[sub]], y[sub][inside[sub]], s=4, color=BLUE, alpha=0.5, label='inside')
axL.scatter(x[sub][~inside[sub]], y[sub][~inside[sub]], s=4, color=ORANGE, alpha=0.5, label='outside')
th = np.linspace(0, 2*np.pi, 200)
axL.plot(np.cos(th), np.sin(th), color='k', lw=1.5)
axL.set_aspect('equal'); axL.set_title(f'{4000} darts: blue/total approx pi/4')
axL.set_xlabel('x'); axL.set_ylabel('y'); axL.legend(loc='upper right', fontsize=8)
run_p = np.cumsum(inside) / np.arange(1, n + 1) # running estimate of p
run_pi = 4 * run_p
ns = np.arange(1, n + 1)
band = 4 * 1.96 * np.sqrt(run_p * (1 - run_p) / ns)
axR.plot(ns, run_pi, color=BLUE, lw=1.2, label='running estimate')
axR.fill_between(ns, run_pi - band, run_pi + band, color=BLUE, alpha=0.15, label='95% band')
axR.axhline(np.pi, color=RED, ls='--', lw=1.4, label=f'true pi = {np.pi:.4f}')
axR.set_xscale('log'); axR.set_xlim(10, n); axR.set_ylim(2.9, 3.4)
axR.set_xlabel('samples n (log scale)'); axR.set_ylabel('estimate of pi')
axR.set_title('convergence: band shrinks like 1/sqrt(n)'); axR.legend(loc='upper right', fontsize=8)
plt.tight_layout(); plt.show()
Each below is a one-line average. We report the estimate, its SE, and the truth.
# Example 1 -- Gaussian integral via importance reweighting (seed 42 -> 1.7708).
rng = np.random.default_rng(42)
X = rng.standard_normal(100_000)
h = np.sqrt(2*np.pi) * np.exp(-X**2 / 2) # exp(-x^2)/phi(x)
gauss, gauss_se = h.mean(), h.std(ddof=1)/np.sqrt(len(X))
print(f'1. Gaussian integral: {gauss:.6f} +/- {gauss_se:.6f} true sqrt(pi) = {np.sqrt(np.pi):.6f}')
# Example 2 -- rare event P(Z > 4): naive MC is unreliable.
rng = np.random.default_rng(42)
Z = rng.standard_normal(100_000)
p_naive = (Z > 4).mean()
p_true = 1 - stats.norm.cdf(4)
n_exceed = int((Z > 4).sum())
print(f'2. Rare event P(Z>4): naive = {p_naive:.3e} ({n_exceed} exceedances in 100k) '
f'true = {p_true:.3e} -> off by {p_naive/p_true:.1f}x (see 2.6)')
# Example 3 -- Phi(1.96), no closed form, as an indicator average.
rng = np.random.default_rng(42)
Z = rng.standard_normal(1_000_000)
phi_hat = (Z <= 1.96).mean()
print(f'3. Normal CDF Phi(1.96): {phi_hat:.6f} true = {stats.norm.cdf(1.96):.6f}')
# Example 4 -- a d=100 product integral; true value -> sqrt(e) as d grows.
rng = np.random.default_rng(42)
d = 100
Xd = rng.random((100_000, d))
vals = np.prod(1 + Xd / d, axis=1)
truth = (1 + 0.5/d)**d
print(f'4. High-dim (d={d}): {vals.mean():.6f} +/- {vals.std(ddof=1)/np.sqrt(len(vals)):.6f} '
f'true = {truth:.6f} (sqrt(e) = {np.sqrt(np.e):.6f})')
1. Gaussian integral: 1.770751 +/- 0.002214 true sqrt(pi) = 1.772454 2. Rare event P(Z>4): naive = 9.000e-05 (9 exceedances in 100k) true = 3.167e-05 -> off by 2.8x (see 2.6) 3. Normal CDF Phi(1.96): 0.974928 true = 0.975002 4. High-dim (d=100): 1.646530 +/- 0.000149 true = 1.646668 (sqrt(e) = 1.648721)
A deterministic grid with $m$ points per axis costs $m^d$ evaluations — hopeless for large $d$. Worse, a function concentrated near the origin (a Gaussian, a posterior) lives in a vanishing sliver of the cube: the inscribed ball's share of $[-1,1]^d$ collapses to essentially zero. Monte Carlo's $O(n^{-1/2})$ rate, by contrast, does not care about $d$.
# Fraction of the hypercube [-1,1]^d that lies inside the inscribed unit ball.
from scipy.special import gamma
print(f"{'d':>4} {'cube vol':>12} {'ball vol':>12} {'ratio (% in ball)':>18}")
print('-' * 52)
for d in [1, 2, 3, 5, 10, 20, 50, 100]:
cube = 2.0**d
ball = np.pi**(d/2) / gamma(d/2 + 1)
print(f'{d:>4} {cube:>12.3e} {ball:>12.3e} {100*ball/cube:>17.3e}%')
print('-> by d=10 only ~0.25% of the cube is in the ball; by d=100 it is ~1e-68%.')
print(' Uniform darts almost never hit where a peaked integrand lives.')
d cube vol ball vol ratio (% in ball) ---------------------------------------------------- 1 2.000e+00 2.000e+00 1.000e+02% 2 4.000e+00 3.142e+00 7.854e+01% 3 8.000e+00 4.189e+00 5.236e+01% 5 3.200e+01 5.264e+00 1.645e+01% 10 1.024e+03 2.550e+00 2.490e-01% 20 1.049e+06 2.581e-02 2.461e-06% 50 1.126e+15 1.730e-13 1.537e-26% 100 1.268e+30 2.368e-40 1.868e-68% -> by d=10 only ~0.25% of the cube is in the ball; by d=100 it is ~1e-68%. Uniform darts almost never hit where a peaked integrand lives.
Here is the first place Monte Carlo meets real data — and a preview of Chapter 4. Michelson's 1879 experiment recorded 100 measurements of the speed of light (stored as km/s $-\;299{,}000$). We want the uncertainty of the sample mean. Classical theory gives $\widehat{\mathrm{SE}} = s/\sqrt n$. The bootstrap gets the same answer by brute-force simulation: resample the 100 numbers with replacement $B$ times, recompute the mean each time, and read off the spread of those $B$ means. No formula — just Monte Carlo over the data itself.
# Real dataset: Michelson speed-of-light measurements (km/s - 299000).
morley = pd.read_csv(data_url(4, 'morley.csv'))
speed = morley['Speed'].to_numpy(dtype=float)
n = speed.size
xbar, s = speed.mean(), speed.std(ddof=1)
se_clt = s / np.sqrt(n)
true_excess = 299792.458 - 299000 # modern accepted value, same units
print(f'Michelson: n = {n} runs, mean = {xbar:.1f} km/s (i.e. {299000+xbar:.1f}), sd = {s:.1f}')
print(f' modern truth in these units = {true_excess:.1f} -> Michelson was high by '
f'{xbar - true_excess:.1f} km/s')
# Bootstrap = Monte Carlo over the data: B resamples, mean of each.
rng = np.random.default_rng(418)
B = 10_000
idx = rng.integers(0, n, size=(B, n)) # B x n resample indices
boot_means = speed[idx].mean(axis=1)
se_boot = boot_means.std(ddof=1)
ci_pct = np.percentile(boot_means, [2.5, 97.5])
print(f'\nBootstrap (B = {B:,} resamples):')
print(f' bootstrap SE of the mean = {se_boot:.3f} vs classical s/sqrt(n) = {se_clt:.3f} '
f'(agree within {100*abs(se_boot-se_clt)/se_clt:.1f}%)')
print(f' percentile 95% CI for the mean = ({ci_pct[0]:.1f}, {ci_pct[1]:.1f})')
print(f' -> the modern value {true_excess:.1f} is OUTSIDE this CI: a real, famous bias.')
Michelson: n = 100 runs, mean = 852.4 km/s (i.e. 299852.4), sd = 79.0
modern truth in these units = 792.5 -> Michelson was high by 59.9 km/s
Bootstrap (B = 10,000 resamples):
bootstrap SE of the mean = 8.023 vs classical s/sqrt(n) = 7.901 (agree within 1.5%)
percentile 95% CI for the mean = (836.5, 867.7)
-> the modern value 792.5 is OUTSIDE this CI: a real, famous bias.
# Figure: the Monte Carlo sampling distribution of the mean, with the CLT overlay.
fig, ax = plt.subplots()
ax.hist(boot_means, bins=45, density=True, color=BLUE, alpha=0.55, edgecolor='white',
label=f'{B:,} bootstrap means')
grid = np.linspace(boot_means.min(), boot_means.max(), 300)
ax.plot(grid, stats.norm.pdf(grid, xbar, se_clt), color=RED, lw=2.2,
label=r'CLT $\mathcal{N}(\bar x,\, s^2/n)$')
ax.axvline(xbar, color='k', lw=1.4, ls=':', label=f'mean = {xbar:.1f}')
ax.axvline(true_excess, color=GREEN, lw=2.0, ls='--', label=f'modern truth = {true_excess:.1f}')
ax.set_xlabel('resampled mean speed (km/s - 299000)'); ax.set_ylabel('density')
ax.set_title('Bootstrap sampling distribution of the mean = Monte Carlo over data')
ax.legend(fontsize=8); plt.tight_layout(); plt.show()
Read-out. The Monte Carlo $\pi$ estimate landed at $3.1431$ with the true value comfortably inside its 95% interval; the four worked integrals reproduced $\sqrt\pi$, $\Phi(1.96)$, and a 100-dimensional integral to several digits — while the rare event $P(Z>4)$ exposed naive Monte Carlo's Achilles heel. The bootstrap then showed Monte Carlo running directly on data: resampling Michelson's 100 numbers gave a standard error of the mean that matches the textbook $s/\sqrt n$ almost exactly, and a 95% CI that excludes the modern speed of light — a genuine historical bias, recovered with no formula. Chapter 4 develops this resampling idea in full.
Optional / self-study. This section is the "plumbing": how a deterministic computer fakes randomness. You can skim it and trust
np.random.default_rng()on a first pass — but the RANDU cautionary tale below is worth seeing once.
Every variate we generate starts as $U\sim\mathrm{Uniform}(0,1)$. A linear congruential generator (LCG) produces a stream from $X_{n+1} = (aX_n + c) \bmod m$. It is fast and reproducible, but poorly chosen constants leave fingerprints: consecutive tuples fall on a small number of parallel hyperplanes. The infamous RANDU (IBM, 1960s: $a=65539$, $c=0$, $m=2^{31}$) satisfies the exact algebraic relation $X_{n+2} = 6X_{n+1} - 9X_n \pmod{2^{31}}$, so every consecutive triple lies on just 15 planes in the unit cube — disastrous for 3-D simulation.
# A minimal LCG, plus a numerical proof of RANDU's hidden plane structure.
def lcg(a, c, m, seed, n):
x = np.empty(n, dtype=np.int64)
x[0] = seed
for i in range(1, n):
x[i] = (a * x[i-1] + c) % m
return x
# RANDU: a = 65539, c = 0, m = 2**31, odd seed.
M = 2**31
X = lcg(65539, 0, M, seed=1, n=100_000)
U = X / M
# The algebra: X[n+2] - 6 X[n+1] + 9 X[n] == 0 (mod 2**31) for EVERY triple.
residual = (X[2:] - 6*X[1:-1] + 9*X[:-2]) % M
# residual is either 0 or 2**31 (== 0 mod m); fold both to 0:
folded = np.minimum(residual, M - residual)
print(f'RANDU triples obeying X[n+2] = 6 X[n+1] - 9 X[n] (mod 2^31): '
f'{(folded == 0).mean()*100:.1f}% of {len(folded):,}')
print(f'max |residual| (mod m) across all triples = {folded.max()} -> exact linear relation')
# Sanity on the uniform marginal (the 1-D test it PASSES, hiding the 3-D failure):
print(f'mean(U) = {U.mean():.4f} (target 0.5), var(U) = {U.var():.4f} (target {1/12:.4f})')
RANDU triples obeying X[n+2] = 6 X[n+1] - 9 X[n] (mod 2^31): 100.0% of 99,998 max |residual| (mod m) across all triples = 0 -> exact linear relation mean(U) = 0.5019 (target 0.5), var(U) = 0.0833 (target 0.0833)
# Figure: a small-modulus LCG shows the parallel-line lattice in 2-D directly.
Xs = lcg(a=1229, c=1, m=2048, seed=1, n=2048) / 2048 # tiny m to expose lines
fig, (a1, a2) = plt.subplots(1, 2, figsize=(11.5, 4.4))
a1.scatter(Xs[:-1], Xs[1:], s=6, color=ORANGE, alpha=0.6)
a1.set_title('weak LCG (m=2048): pairs fall on few parallel lines')
a1.set_xlabel('U[n]'); a1.set_ylabel('U[n+1]'); a1.set_aspect('equal')
good = np.random.default_rng(418).random(2048)
a2.scatter(good[:-1], good[1:], s=6, color=GREEN, alpha=0.6)
a2.set_title("modern PCG64 (default_rng): no visible structure")
a2.set_xlabel('U[n]'); a2.set_ylabel('U[n+1]'); a2.set_aspect('equal')
plt.tight_layout(); plt.show()
# Modern best practice: default_rng (PCG64) + independent parallel streams.
master = np.random.default_rng(418)
print('reproducible draws (seed 418):', np.round(master.random(4), 4))
# Independent, non-overlapping streams for parallel jobs via SeedSequence.spawn:
children = np.random.SeedSequence(418).spawn(3)
streams = [np.random.default_rng(c) for c in children]
for k, g in enumerate(streams):
print(f' worker {k} first draws: {np.round(g.random(3), 4)}')
print('Each worker gets a statistically independent stream -- never reseed in a loop.')
reproducible draws (seed 418): [0.3192 0.9897 0.2241 0.2963] worker 0 first draws: [0.1288 0.3528 0.6102] worker 1 first draws: [0.239 0.7851 0.3758] worker 2 first draws: [0.6725 0.8036 0.876 ] Each worker gets a statistically independent stream -- never reseed in a loop.
Read-out. RANDU passes the simplest 1-D check (its uniforms have mean
$\approx 0.5$, variance $\approx 1/12$) yet 100% of its consecutive triples
satisfy an exact linear relation — they live on 15 planes, which silently biased
3-D Monte Carlo studies for years. The lesson is permanent: use a vetted modern
generator (np.random.default_rng(), PCG64), seed it once, and spawn independent
streams for parallel work rather than reseeding inside a loop.
The probability integral transform is the workhorse for turning uniforms into any distribution: if $U\sim\mathrm{Uniform}(0,1)$ and $F$ is a CDF, then $X = F^{-1}(U)$ has distribution $F$. When $F^{-1}$ is available in closed form, sampling is a one-liner.
For the Exponential$(\lambda)$, $F(x)=1-e^{-\lambda x}$ inverts to $F^{-1}(u) = -\ln(1-u)/\lambda$; since $U$ and $1-U$ are both uniform we use the tail-stable $X = -\ln(U)/\lambda$. We verify with the moments and a Kolmogorov–Smirnov goodness-of-fit test (large $p$-value = the samples are indistinguishable from the target).
# Exponential via inverse-CDF (seed 42 reproduces the webbook's 0.5031/0.2486/0.3483).
rng = np.random.default_rng(42)
rate = 2.0
U = rng.random(10_000)
U = np.maximum(U, np.finfo(float).tiny) # guard log(0)
Xe = -np.log(U) / rate
ks = stats.kstest(Xe, 'expon', args=(0, 1/rate)) # loc=0, scale=1/rate
print(f'Exponential(rate={rate}) via X = -ln(U)/rate, n = {len(Xe):,}')
print(f' sample mean = {Xe.mean():.4f} theory 1/rate = {1/rate:.4f}')
print(f' sample var = {Xe.var(ddof=1):.4f} theory 1/rate^2 = {1/rate**2:.4f}')
print(f' sample median = {np.median(Xe):.4f} theory ln2/rate = {np.log(2)/rate:.4f}')
print(f' KS test vs Exponential: statistic = {ks.statistic:.4f}, p-value = {ks.pvalue:.3f} '
f'(p large -> good fit)')
Exponential(rate=2.0) via X = -ln(U)/rate, n = 10,000 sample mean = 0.5031 theory 1/rate = 0.5000 sample var = 0.2486 theory 1/rate^2 = 0.2500 sample median = 0.3483 theory ln2/rate = 0.3466 KS test vs Exponential: statistic = 0.0074, p-value = 0.634 (p large -> good fit)
# Figure: inverse-CDF samples vs the true density, and the geometric construction.
fig, (a1, a2) = plt.subplots(1, 2, figsize=(11.5, 4.4))
a1.hist(Xe, bins=50, density=True, color=BLUE, alpha=0.55, edgecolor='white', label='inverse-CDF samples')
xx = np.linspace(0, Xe.max(), 300)
a1.plot(xx, rate*np.exp(-rate*xx), color=RED, lw=2.2, label=f'true Exp({rate}) pdf')
a1.set_xlabel('x'); a1.set_ylabel('density'); a1.set_title('inverse-CDF matches the target'); a1.legend()
# Geometric picture: pick u on the y-axis of F, read x = F^{-1}(u) on the x-axis.
xg = np.linspace(0, 3, 300); Fg = 1 - np.exp(-rate*xg)
a2.plot(xg, Fg, color=BLUE, lw=2.2, label=r'CDF $F(x)=1-e^{-\lambda x}$')
for u in [0.25, 0.5, 0.8]:
xinv = -np.log(1-u)/rate
a2.plot([0, xinv], [u, u], color=ORANGE, lw=1.2)
a2.plot([xinv, xinv], [0, u], color=ORANGE, lw=1.2)
a2.plot(xinv, u, 'o', color=RED, ms=6)
a2.set_xlabel('x'); a2.set_ylabel('u = F(x)'); a2.set_title('U on y-axis -> X on x-axis'); a2.legend()
plt.tight_layout(); plt.show()
For a categorical distribution, inverse-CDF is a search: build the cumulative
probabilities and find the first one exceeding $U$ (np.searchsorted, an
$O(\log K)$ binary search). Walker's alias method does $O(K)$ preprocessing so
that each draw is $O(1)$ — two array lookups and one comparison — ideal when you
need millions of samples from a fixed large support. Both must reproduce the
target frequencies exactly.
# Target categorical distribution.
values = np.array([0, 1, 2, 3, 4])
probs = np.array([0.10, 0.15, 0.20, 0.25, 0.30])
N = 200_000
rng = np.random.default_rng(418)
# (A) Inverse-CDF via binary search on the cumulative probabilities.
cum = np.cumsum(probs)
draws_inv = np.searchsorted(cum, rng.random(N))
# (B) Walker's alias method: O(K) build, O(1) per draw.
def build_alias(p):
K = len(p); q = p * K
prob = np.zeros(K); alias = np.zeros(K, dtype=int)
small = [i for i in range(K) if q[i] < 1.0]
large = [i for i in range(K) if q[i] >= 1.0]
while small and large:
s = small.pop(); l = large.pop()
prob[s] = q[s]; alias[s] = l
q[l] -= (1.0 - q[s])
(small if q[l] < 1.0 else large).append(l)
for i in large + small:
prob[i] = 1.0
return prob, alias
aprob, aalias = build_alias(probs.copy())
col = rng.integers(0, len(probs), N)
draws_alias = np.where(rng.random(N) < aprob[col], col, aalias[col])
emp_inv = np.bincount(draws_inv, minlength=5) / N
emp_ali = np.bincount(draws_alias, minlength=5) / N
print('category :', ' '.join(f'{v:6d}' for v in values))
print('target prob :', ' '.join(f'{p:6.3f}' for p in probs))
print('inverse-CDF freq:', ' '.join(f'{p:6.3f}' for p in emp_inv))
print('alias-method freq:', ' '.join(f'{p:6.3f}' for p in emp_ali))
print(f'max |empirical - target|: inverse = {np.abs(emp_inv-probs).max():.4f}, '
f'alias = {np.abs(emp_ali-probs).max():.4f} (both exact up to MC noise)')
category : 0 1 2 3 4 target prob : 0.100 0.150 0.200 0.250 0.300 inverse-CDF freq: 0.099 0.151 0.200 0.251 0.299 alias-method freq: 0.100 0.152 0.199 0.249 0.300 max |empirical - target|: inverse = 0.0010, alias = 0.0017 (both exact up to MC noise)
Read-out. The inverse-CDF exponential sampler matched its theoretical mean, variance, and median, and the Kolmogorov–Smirnov test could not distinguish the samples from a true Exponential (large $p$-value). For discrete data, both the $O(\log K)$ binary-search inversion and the $O(1)$ alias method reproduced the target category frequencies to Monte Carlo precision — the alias method trading a one-time $O(K)$ build for constant-time draws, the right choice at scale.
Optional / self-study. How
rng.standard_normal()is built. Skim on a first pass — in practice always call the library — but the Box–Muller geometry and the Cholesky trick for correlated normals are genuinely useful to have seen.
The Gaussian has no elementary inverse CDF, so we transform instead. The Box–Muller transform turns two independent uniforms into two independent standard normals via polar coordinates:
$$ Z_1 = \sqrt{-2\ln U_1}\,\cos(2\pi U_2), \qquad Z_2 = \sqrt{-2\ln U_1}\,\sin(2\pi U_2). $$And once we have independent standard normals, a Cholesky factor $\Sigma = LL^\top$ lets us build a correlated multivariate normal: $X = \mu + LZ$ has mean $\mu$ and covariance $\Sigma$.
# Box-Muller: two uniforms -> two independent standard normals.
rng = np.random.default_rng(418)
m = 100_000
U1 = np.maximum(rng.random(m), np.finfo(float).tiny)
U2 = rng.random(m)
R = np.sqrt(-2*np.log(U1))
Z1 = R * np.cos(2*np.pi*U2)
Z2 = R * np.sin(2*np.pi*U2)
Z = np.concatenate([Z1, Z2])
ks = stats.kstest(Z, 'norm')
print(f'Box-Muller standard normals (n = {Z.size:,}):')
print(f' mean = {Z.mean():+.4f} (target 0), var = {Z.var():.4f} (target 1)')
print(f' corr(Z1, Z2) = {np.corrcoef(Z1, Z2)[0,1]:+.4f} (target 0 -> the pair is independent)')
print(f' KS vs N(0,1): statistic = {ks.statistic:.4f}, p-value = {ks.pvalue:.3f}')
Box-Muller standard normals (n = 200,000): mean = +0.0003 (target 0), var = 0.9997 (target 1) corr(Z1, Z2) = -0.0035 (target 0 -> the pair is independent) KS vs N(0,1): statistic = 0.0014, p-value = 0.818
# Cholesky: turn independent normals into a target-correlated multivariate normal.
mu = np.array([1.0, -2.0])
Sigma = np.array([[2.0, 0.8],
[0.8, 1.0]])
L = np.linalg.cholesky(Sigma)
rng = np.random.default_rng(418)
Zmat = rng.standard_normal((50_000, 2))
Xmv = mu + Zmat @ L.T
emp_mean = Xmv.mean(axis=0)
emp_cov = np.cov(Xmv, rowvar=False)
print('multivariate normal via X = mu + L Z:')
print(f' empirical mean = [{emp_mean[0]:+.3f}, {emp_mean[1]:+.3f}] target = [{mu[0]:+.1f}, {mu[1]:+.1f}]')
print(' empirical cov =\n', np.round(emp_cov, 3))
print(' target cov =\n', Sigma)
fig, ax = plt.subplots(figsize=(5.6, 5.0))
ax.scatter(Xmv[:3000, 0], Xmv[:3000, 1], s=5, alpha=0.25, color=PURPLE)
# overlay the 95% covariance ellipse from the eigen-decomposition of Sigma
vals, vecs = np.linalg.eigh(Sigma)
ang = np.degrees(np.arctan2(vecs[1, 1], vecs[0, 1]))
from matplotlib.patches import Ellipse
w, h = 2*np.sqrt(vals[::-1]*stats.chi2.ppf(0.95, 2))
ax.add_patch(Ellipse(mu, w, h, angle=ang, fill=False, color=RED, lw=2.2, label='95% ellipse'))
ax.plot(*mu, 'X', color='k', ms=10)
ax.set_xlabel('X1'); ax.set_ylabel('X2'); ax.set_aspect('equal')
ax.set_title('Cholesky-correlated normals'); ax.legend()
plt.tight_layout(); plt.show()
multivariate normal via X = mu + L Z: empirical mean = [+1.002, -2.002] target = [+1.0, -2.0] empirical cov = [[1.997 0.795] [0.795 0.997]] target cov = [[2. 0.8] [0.8 1. ]]
Read-out. Box–Muller produced standard normals indistinguishable from the target (mean $\approx 0$, variance $\approx 1$, KS $p$-value large) with the two output streams genuinely uncorrelated. The Cholesky factor then bent independent normals into a cloud whose empirical mean and covariance reproduced the requested $\mu$ and $\Sigma$, the 95% ellipse hugging the data — exactly the mechanism behind correlated-Gaussian simulation in finance, spatial statistics, and Bayesian priors.
When no inverse CDF or transform exists, rejection sampling still works for any density — even one known only up to a constant. Enclose the target $\tilde f(x)$ under an envelope $M\,g(x)$ from an easy-to-sample proposal $g$ (with $\tilde f(x)\le M g(x)$ everywhere), draw $X\sim g$ and $U\sim\mathrm{Uniform}(0,1)$, and accept $X$ when $U \le \tilde f(X)/[M\,g(X)]$. Accepted draws are exact samples from $f$. For a normalized target the acceptance probability is $1/M$, so a tight envelope (small $M$) is everything.
Worked example — Beta(2.5, 6) with a Uniform proposal. The mode is at $x^\* = (\alpha-1)/(\alpha+\beta-2) = 1.5/6.5 \approx 0.231$, so the smallest valid envelope constant is $M^\* = f(x^\*) \approx 2.61$, predicting an acceptance rate of $1/M^\* \approx 38.3\%$.
# Rejection sampling from Beta(2.5, 6) with a Uniform(0,1) proposal.
a, b = 2.5, 6.0
mode = (a - 1) / (a + b - 2)
M = stats.beta.pdf(mode, a, b) # smallest valid envelope constant
print(f'Beta({a}, {b}): mode x* = {mode:.4f}, M* = f(x*) = {M:.4f}, '
f'predicted accept rate 1/M = {1/M:.4f}')
rng = np.random.default_rng(418)
n_prop = 300_000
Xp = rng.random(n_prop) # proposal draws ~ Uniform(0,1)
Uacc = rng.random(n_prop)
accept = Uacc <= stats.beta.pdf(Xp, a, b) / M # g(x)=1, so M*g(x)=M
samples = Xp[accept]
print(f' empirical accept rate = {accept.mean():.4f} ({samples.size:,} accepted of {n_prop:,})')
print(f' sample mean = {samples.mean():.4f} theory a/(a+b) = {a/(a+b):.4f}')
ks = stats.kstest(samples, 'beta', args=(a, b))
print(f' KS vs Beta({a},{b}): statistic = {ks.statistic:.4f}, p-value = {ks.pvalue:.3f}')
Beta(2.5, 6.0): mode x* = 0.2308, M* = f(x*) = 2.6268, predicted accept rate 1/M = 0.3807 empirical accept rate = 0.3807 (114,205 accepted of 300,000) sample mean = 0.2939 theory a/(a+b) = 0.2941 KS vs Beta(2.5,6.0): statistic = 0.0023, p-value = 0.579
# Figure: the envelope, plus accepted (blue) vs rejected (orange) proposal points.
xx = np.linspace(0, 1, 400)
fig, ax = plt.subplots()
ax.plot(xx, stats.beta.pdf(xx, a, b), color=RED, lw=2.4, label=f'target Beta({a},{b})')
ax.axhline(M, color='k', ls='--', lw=1.4, label=f'envelope M*g(x) = {M:.2f}')
sub = slice(0, 4000)
yvis = Uacc[sub] * M # height of each dart under the envelope
ax.scatter(Xp[sub][accept[sub]], yvis[accept[sub]], s=5, color=BLUE, alpha=0.35, label='accepted')
ax.scatter(Xp[sub][~accept[sub]], yvis[~accept[sub]], s=5, color=ORANGE, alpha=0.25, label='rejected')
ax.set_xlabel('x'); ax.set_ylabel('density / dart height')
ax.set_title(f'Rejection sampling: accept fraction = area ratio = 1/M = {1/M:.3f}')
ax.legend(loc='upper right', fontsize=8); plt.tight_layout(); plt.show()
Rejection sampling shines when the target is unnormalized — the typical Bayesian situation. Take Stephen Curry's 2023-24 free throws: $k$ makes in $m$ attempts. With a flat $\mathrm{Beta}(1,1)$ prior on the make-probability $\theta$, the posterior kernel is $\tilde f(\theta) = \theta^{k}(1-\theta)^{m-k}$ (unnormalized). We sample it with a Uniform proposal, capping the envelope at the peak $\tilde f(\hat\theta)$ where $\hat\theta=k/m$, and check the rejection-sample mean against the exact $\mathrm{Beta}(1+k,\,1+m-k)$ posterior mean. (Conjugacy gives the closed form here; rejection is what you reach for when it does not.)
# Real dataset: Curry's cumulative free throws -> a Beta-Binomial posterior.
ft = pd.read_csv(data_url(3, 'nba_freethrows_2023_24.csv'))
curry = ft[ft['player'] == 'Stephen Curry']
k = int(curry['cum_ft'].iloc[-1]) # makes
m = int(curry['cum_fta'].iloc[-1]) # attempts
mle = k / m
print(f'Curry 2023-24: {k} makes / {m} attempts -> MLE theta_hat = {mle:.4f}')
# Unnormalized posterior kernel (log scale for stability); envelope peak at the MLE.
def log_kernel(t):
return k*np.log(t) + (m-k)*np.log1p(-t)
log_M = log_kernel(mle) # log of the peak height
rng = np.random.default_rng(418)
n_prop = 600_000
Tp = rng.random(n_prop)
with np.errstate(divide='ignore'):
log_ratio = log_kernel(Tp) - log_M # log[ f(t) / (M*g(t)) ], g=1
accept = np.log(rng.random(n_prop)) <= log_ratio
post = Tp[accept]
exact_mean = (1 + k) / (2 + m) # Beta(1+k, 1+m-k) mean
exact_sd = np.sqrt((1+k)*(1+m-k) / ((2+m)**2 * (3+m)))
print(f' accept rate = {accept.mean():.4f} ({post.size:,} draws) '
f'-> a Uniform proposal is wasteful for a peaked posterior (a Normal/Beta proposal is better)')
print(f' rejection-sample posterior mean = {post.mean():.4f} +/- {post.std(ddof=1):.4f}')
print(f' exact Beta({1+k}, {1+m-k}) mean = {exact_mean:.4f}, sd = {exact_sd:.4f} (match)')
print(f' rejection 95% credible interval = ({np.percentile(post,2.5):.4f}, {np.percentile(post,97.5):.4f})')
Curry 2023-24: 299 makes / 324 attempts -> MLE theta_hat = 0.9228 accept rate = 0.0374 (22,427 draws) -> a Uniform proposal is wasteful for a peaked posterior (a Normal/Beta proposal is better) rejection-sample posterior mean = 0.9203 +/- 0.0149 exact Beta(300, 26) mean = 0.9202, sd = 0.0150 (match) rejection 95% credible interval = (0.8892, 0.9470)
Read-out. With a Uniform proposal, Beta(2.5, 6) was sampled at the predicted $\approx 38\%$ acceptance rate, its sample mean matching $a/(a+b)$ and passing the KS test. The real-data free-throw posterior then made the unnormalized case concrete: rejection drew from $\theta^{k}(1-\theta)^{m-k}$ directly and recovered the exact $\mathrm{Beta}(1+k,1+m-k)$ posterior mean — but at a low acceptance rate, because a flat proposal wastes most draws on a sharply peaked posterior. That inefficiency is exactly what importance sampling (next) and, later, MCMC are built to fix.
Monte Carlo error is $\sigma/\sqrt n$. We cannot beat the $\sqrt n$, but we can shrink the constant $\sigma$ — often by orders of magnitude — with smarter sampling. Four techniques, each with a formula for the gain:
# Importance sampling for the rare event P(Z>4): exponential tilting to N(4,1).
# Seed 42 + the webbook's draw order reproduce VRF ~ 7000x.
rng = np.random.default_rng(42)
n = 10_000
thr = 4.0
p_true = 1 - stats.norm.cdf(thr)
Z_naive = rng.standard_normal(n) # naive MC (drawn first, as in webbook)
p_naive = (Z_naive > thr).mean()
se_naive = np.sqrt(p_true*(1-p_true)/n)
X_is = rng.normal(loc=thr, size=n) # proposal g = N(4,1)
log_w = -thr*X_is + thr**2/2 # log(f/g) = log[phi(x)/phi(x-4)]
w = np.exp(log_w)
ind = (X_is > thr).astype(float)
wh = ind * w
p_is = wh.mean()
se_is = np.sqrt(wh.var()/n)
norm_w = w / w.sum()
ess = 1.0 / np.sum(norm_w**2)
vrf = (p_true*(1-p_true)) / wh.var()
print(f'True P(Z>4) = {p_true:.3e}')
print(f'Naive MC (n={n:,}): est = {p_naive:.3e}, SE = {se_naive:.3e} '
f'({int((Z_naive>thr).sum())} exceedance(s))')
print(f'Importance sampling: est = {p_is:.3e}, SE = {se_is:.2e}, ESS = {ess:.1f}')
print(f' -> variance reduction factor ~ {vrf:.0f}x '
f'(IS reaches with 10k samples what naive MC needs ~{vrf:.0f}x more to match)')
True P(Z>4) = 3.167e-05 Naive MC (n=10,000): est = 1.000e-04, SE = 5.628e-05 (1 exceedance(s)) Importance sampling: est = 3.142e-05, SE = 6.71e-07, ESS = 7.5 -> variance reduction factor ~ 7024x (IS reaches with 10k samples what naive MC needs ~7024x more to match)
# Figure: why tilting works -- move the proposal onto the rare region.
xx = np.linspace(-4, 8, 400)
fig, ax = plt.subplots()
ax.plot(xx, stats.norm.pdf(xx), color=BLUE, lw=2.2, label=r'target $f=\phi(x)$')
ax.plot(xx, stats.norm.pdf(xx, loc=thr), color=GREEN, lw=2.2, label=r'proposal $g=\phi(x-4)$')
ax.axvline(thr, color='k', ls=':', lw=1.4)
tail = xx[xx >= thr]
ax.fill_between(tail, 0, stats.norm.pdf(tail), color=RED, alpha=0.5, label=r'rare region $x>4$')
ax.set_xlabel('x'); ax.set_ylabel('density')
ax.set_title('Importance sampling: tilt the proposal onto the tail'); ax.legend()
plt.tight_layout(); plt.show()
# Control variates and antithetic variates for I = E[e^U], U ~ Uniform(0,1).
# True value e - 1; both methods exploit the strong monotone structure of e^u.
I_true = np.e - 1
rng = np.random.default_rng(418)
n = 100_000
# --- Control variate: use C = U (known mean 1/2), beta* = Cov(H,C)/Var(C). ---
U = rng.random(n)
H = np.exp(U)
C = U
beta = np.cov(H, C, ddof=1)[0, 1] / np.var(C, ddof=1)
H_cv = H - beta * (C - 0.5)
rho_cv = np.corrcoef(H, C)[0, 1]
vrf_cv = 1.0 / (1.0 - rho_cv**2)
print(f'I = E[e^U], true = {I_true:.5f}')
print(f'Control variate (C=U): beta* = {beta:.4f}, corr(H,C) = {rho_cv:.4f}')
print(f' naive est = {H.mean():.5f}, SE = {H.std(ddof=1)/np.sqrt(n):.5f}')
print(f' CV est = {H_cv.mean():.5f}, SE = {H_cv.std(ddof=1)/np.sqrt(n):.5f} '
f'(predicted VRF 1/(1-rho^2) = {vrf_cv:.0f}x)')
# --- Antithetic variate: pair U with 1-U (monotone h -> negative correlation). ---
rng = np.random.default_rng(418)
npairs = 50_000
Ua = rng.random(npairs)
H1, H2 = np.exp(Ua), np.exp(1 - Ua)
anti = (H1 + H2) / 2
rho_a = np.corrcoef(H1, H2)[0, 1]
vrf_a = 1.0 / (1.0 + rho_a)
Hind = np.exp(rng.random(2*npairs)) # equal-cost independent baseline
print(f'\nAntithetic (U, 1-U): corr = {rho_a:.4f} (negative -> helps)')
print(f' antithetic est = {anti.mean():.5f}, SE = {anti.std(ddof=1)/np.sqrt(npairs):.6f}')
print(f' independent est = {Hind.mean():.5f}, SE = {Hind.std(ddof=1)/np.sqrt(2*npairs):.6f} '
f'(predicted per-eval VRF 1/(1+rho) = {vrf_a:.0f}x)')
# --- Counterexample: a symmetric h makes antithetic variates BACKFIRE. ---
g1, g2 = (Ua - 0.5)**2, ((1 - Ua) - 0.5)**2 # h(u)=(u-0.5)^2 is symmetric
print(f'\nCounterexample h(u)=(u-0.5)^2: corr(h(U),h(1-U)) = {np.corrcoef(g1, g2)[0,1]:+.3f} '
f'(= +1 -> antithetic DOUBLES variance; only use it for monotone h).')
I = E[e^U], true = 1.71828 Control variate (C=U): beta* = 1.6897, corr(H,C) = 0.9918 naive est = 1.71672, SE = 0.00155 CV est = 1.71795, SE = 0.00020 (predicted VRF 1/(1-rho^2) = 62x) Antithetic (U, 1-U): corr = -0.9678 (negative -> helps) antithetic est = 1.71773, SE = 0.000278 independent est = 1.71883, SE = 0.001558 (predicted per-eval VRF 1/(1+rho) = 31x) Counterexample h(u)=(u-0.5)^2: corr(h(U),h(1-U)) = +1.000 (= +1 -> antithetic DOUBLES variance; only use it for monotone h).
# Stratified sampling for the same I = E[e^U]: split [0,1] into K equal strata.
rng = np.random.default_rng(418)
n_total = 100_000
K = 10
per = n_total // K
p_k = 1.0 / K # equal stratum probabilities
# Naive baseline.
U = rng.random(n_total)
naive_mean = np.exp(U).mean()
naive_var = np.exp(U).var(ddof=1) / n_total
# Proportional allocation: 'per' uniforms inside each [k/K, (k+1)/K].
strat_means = np.empty(K); strat_vars = np.empty(K)
for kk in range(K):
Uk = (kk + rng.random(per)) / K # Uniform within stratum kk
hk = np.exp(Uk)
strat_means[kk] = hk.mean()
strat_vars[kk] = hk.var(ddof=1)
strat_est = np.sum(p_k * strat_means)
strat_var = np.sum(p_k**2 * strat_vars / per)
print(f'I = E[e^U], true = {np.e-1:.5f}')
print(f' naive est = {naive_mean:.5f}, variance = {naive_var:.3e}')
print(f' stratified est = {strat_est:.5f}, variance = {strat_var:.3e} '
f'(K={K} strata)')
print(f' variance reduction factor ~ {naive_var/strat_var:.0f}x '
f'(stratification removes the between-stratum variance of the steep e^u)')
I = E[e^U], true = 1.71828 naive est = 1.71672, variance = 2.407e-06 stratified est = 1.71832, variance = 2.653e-08 (K=10 strata) variance reduction factor ~ 91x (stratification removes the between-stratum variance of the steep e^u)
Read-out. Every method paid off on real estimands. Importance sampling estimated $P(Z>4)\approx 3.14\times10^{-5}$ from just 10,000 draws with a variance reduction near 7000× — naive Monte Carlo, with its lone chance exceedance, was useless. For $\mathbb E[e^U]$, a control variate ($C=U$, $\rho\approx0.99$) slashed variance by the predicted $1/(1-\rho^2)$; antithetic pairing ($\rho<0$ from monotonicity) and 10-stratum stratification each cut it by another large factor — yet the symmetric counterexample showed antithetic variates can double variance when misapplied. Match the technique to the structure of your integrand.
Work each before expanding the solution. They reinforce the section skills on the same examples (and one on real data).
Exercise 1 (§2.1 sample size). How many darts do we need so the 95% CI half-width for $\pi$ is $\le 0.001$? Use the Bernoulli variance of $h=4\cdot\mathbf 1$ at $p=\pi/4$, and the rule $n \ge (1.96\,\sigma/\epsilon)^2$.
# Solution 1.
p = np.pi / 4
sigma = np.sqrt(16 * p * (1 - p)) # SD of h = 4 * indicator
eps = 0.001
n_req = (1.96 * sigma / eps)**2
print(f'sigma of 4*indicator = {sigma:.4f}')
print(f'required n for 95% CI half-width <= {eps}: {n_req:,.0f} (~10 million darts)')
print('The quadratic 1/eps^2 cost is the price of Monte Carlo: each extra digit = 100x work.')
sigma of 4*indicator = 1.6422 required n for 95% CI half-width <= 0.001: 10,359,897 (~10 million darts) The quadratic 1/eps^2 cost is the price of Monte Carlo: each extra digit = 100x work.
Exercise 2 (§2.3 inverse-CDF). The Pareto$(\alpha, x_m)$ has $F^{-1}(u) = x_m\,(1-u)^{-1/\alpha}$, equivalently the tail-stable $x_m\,U^{-1/\alpha}$. Sample $\alpha=2.5,\ x_m=1$ and check the mean against $\alpha x_m/(\alpha-1)$.
# Solution 2.
rng = np.random.default_rng(418)
alpha, xm = 2.5, 1.0
U = np.maximum(rng.random(100_000), np.finfo(float).tiny)
Xp = xm * U**(-1/alpha)
theory_mean = alpha * xm / (alpha - 1)
print(f'Pareto(alpha={alpha}, x_m={xm}) via x_m * U^(-1/alpha):')
print(f' sample mean = {Xp.mean():.4f} theory a*x_m/(a-1) = {theory_mean:.4f}')
print(f' sample min = {Xp.min():.4f} (>= x_m = {xm}, as required)')
print(' Heavy right tail: the sample mean is noisier than for light-tailed targets.')
Pareto(alpha=2.5, x_m=1.0) via x_m * U^(-1/alpha): sample mean = 1.6657 theory a*x_m/(a-1) = 1.6667 sample min = 1.0000 (>= x_m = 1.0, as required) Heavy right tail: the sample mean is noisier than for light-tailed targets.
Exercise 3 (§2.5 rejection). For Beta(2, 5) with a Uniform proposal, predict $M$ and the acceptance rate from the mode, then verify by sampling.
# Solution 3.
a, b = 2.0, 5.0
mode = (a - 1) / (a + b - 2)
M = stats.beta.pdf(mode, a, b)
rng = np.random.default_rng(418)
Xp = rng.random(200_000)
accept = rng.random(200_000) <= stats.beta.pdf(Xp, a, b) / M
print(f'Beta({a},{b}): mode = {mode:.4f}, M = {M:.4f}, predicted accept = 1/M = {1/M:.4f}')
print(f' empirical accept rate = {accept.mean():.4f}')
print(f' sample mean = {Xp[accept].mean():.4f} theory a/(a+b) = {a/(a+b):.4f}')
Beta(2.0,5.0): mode = 0.2000, M = 2.4576, predicted accept = 1/M = 0.4069 empirical accept rate = 0.4088 sample mean = 0.2860 theory a/(a+b) = 0.2857
Exercise 4 (§2.1 / Ch4 real data). Reuse the Michelson speed data: bootstrap
a 95% CI for the median (which has no simple SE formula) using $B=10{,}000$
resamples. This is pure Monte Carlo, and exactly the Chapter 4 workflow.
# Solution 4.
rng = np.random.default_rng(418)
B = 10_000
idx = rng.integers(0, speed.size, size=(B, speed.size))
boot_med = np.median(speed[idx], axis=1)
ci = np.percentile(boot_med, [2.5, 97.5])
print(f'Michelson median speed (km/s - 299000): point = {np.median(speed):.1f}')
print(f' bootstrap 95% CI for the median = ({ci[0]:.1f}, {ci[1]:.1f})')
print(f' bootstrap SE of the median = {boot_med.std(ddof=1):.2f}')
print(' No closed-form SE for a median exists -- the bootstrap (Monte Carlo) just resamples.')
Michelson median speed (km/s - 299000): point = 850.0 bootstrap 95% CI for the median = (840.0, 870.0) bootstrap SE of the median = 8.29 No closed-form SE for a median exists -- the bootstrap (Monte Carlo) just resamples.
The Monte Carlo pipeline. Generate uniforms (§2.2) $\to$ transform them into any target via inverse-CDF / Box–Muller / rejection (§2.3–2.5) $\to$ average to estimate an expectation with a CLT standard error (§2.1) $\to$ shrink the variance with importance sampling, control / antithetic variates, or stratification (§2.6).
| Concept | Formula | This notebook's example |
|---|---|---|
| Monte Carlo estimator | $\hat I_n = \frac1n\sum_i h(X_i)$ | $\pi \approx 3.1431$ |
| Standard error / 95% CI | $s/\sqrt n$; $\hat I_n \pm 1.96\,\mathrm{SE}$ | Michelson mean |
| $O(n^{-1/2})$ rate | dimension-independent | $d=100$ product integral |
| Bootstrap (Ch 4 preview) | resample data, recompute | speed-of-light SE & median CI |
| Inverse CDF | $X = F^{-1}(U)$ | Exponential (+KS), alias method |
| Box–Muller | $Z=\sqrt{-2\ln U_1}\cos(2\pi U_2)$ | standard normals |
| Rejection accept prob. | $\tilde f(X)/[M g(X)]$, rate $1/M$ | Beta(2.5,6) $\to 38\%$; FT posterior |
| Importance weight / ESS | $w=f/g$; $\mathrm{ESS}=1/\sum\bar w_i^2$ | $P(Z>4)$, VRF $\approx 7000$ |
| Control variate VRF | $1/(1-\rho^2)$ | $\mathbb E[e^U]$ |
| Antithetic VRF | $1/(1+\rho)$, needs $\rho<0$ | $\mathbb E[e^U]$ (+ counterexample) |
Forward. The two real-data touchpoints are signposts. The Michelson bootstrap is Chapter 4 in miniature: resampling computes standard errors and confidence intervals with no formulas, even for awkward statistics like the median. The free-throw posterior by rejection is Part III in miniature: posterior expectations are integrals estimated by sampling, and when a flat proposal wastes draws on a peaked posterior, importance sampling and then Markov chain Monte Carlo take over. Master this Monte Carlo toolkit and the rest of the course — the bootstrap, GLM simulation, Bayesian computation — is the same idea, refined.