Section 4.4: The Parametric Bootstrap

The nonparametric bootstrap of Section 4.3 The Nonparametric Bootstrap treats the empirical distribution \(\hat{F}_n\) as a proxy for the unknown population \(F\). This approach requires minimal assumptions but ignores any structural knowledge about \(F\). When a credible parametric model is available, we can do better: the parametric bootstrap simulates from a fitted parametric distribution \(F_{\hat{\theta}}\), potentially achieving greater efficiency and better finite-sample performance.

This section develops the parametric bootstrap as a natural extension of the plug-in principle, explores its theoretical advantages and failure modes, and provides practical guidance for choosing between parametric and nonparametric approaches.

Road Map 🧭

Understand: The parametric bootstrap principle and its relationship to the plug-in estimator
Develop: Algorithms for location-scale families, regression, and GLMs
Implement: Efficient Python code for parametric bootstrap inference
Evaluate: Model diagnostics and the efficiency-robustness tradeoff

The Parametric Bootstrap Principle

Conceptual Framework

Recall the bootstrap philosophy: we approximate the sampling distribution \(G_F\) of a statistic \(\hat{\theta} = T(X_{1:n})\) by studying its behavior under a proxy distribution. In the nonparametric bootstrap, this proxy is \(\hat{F}_n\), the empirical distribution placing mass \(1/n\) on each observation.

The parametric bootstrap takes a different approach. Given a parametric family \(\{F_\theta : \theta \in \Theta\}\), we:

Fit the model: Estimate \(\hat{\theta}_n\) from the data (typically via MLE).
Simulate from the fitted model: Generate \(X_1^*, \ldots, X_n^* \stackrel{iid}{\sim} F_{\hat{\theta}_n}\).
Compute bootstrap statistics: Calculate \(\hat{\theta}^*_b = T(X_1^*, \ldots, X_n^*)\).
Approximate the sampling distribution: Use \(\{\hat{\theta}^*_1, \ldots, \hat{\theta}^*_B\}\) for inference.

Key Distinction

Nonparametric bootstrap: Resample from \(\hat{F}_n\) (the data itself)
Parametric bootstrap: Simulate fresh data from \(F_{\hat{\theta}_n}\) (the fitted model)

The parametric bootstrap generates genuinely new observations—not rearrangements of the original data. Each bootstrap sample is a fresh draw from the estimated population.

Parametric vs nonparametric bootstrap comparison showing resampling vs simulation — Fig. 131 **Figure 4.4.1**: The fundamental difference between parametric and nonparametric bootstrap. Panel (a) shows nonparametric bootstrap resampling the same data points with replacement. Panel (b) shows parametric bootstrap generating fresh observations from the fitted distribution. Panel (d) compares the resulting bootstrap distributions.

Mathematical Framework

Let \(X_1, \ldots, X_n \stackrel{iid}{\sim} F_{\theta_0}\) where \(\theta_0 \in \Theta\) is the true parameter. The goal is to approximate the sampling distribution of \(\hat{\theta}_n = T(X_{1:n})\).

The parametric bootstrap distribution is the conditional distribution of \(\hat{\theta}^*_n\) given \(\hat{\theta}_n\):

\[G^*_{\hat{\theta}_n}(t) = P_{\hat{\theta}_n}\left(\hat{\theta}^*_n \leq t \mid \hat{\theta}_n\right)\]

where \(\hat{\theta}^*_n = T(X_1^*, \ldots, X_n^*)\) with \(X_i^* \stackrel{iid}{\sim} F_{\hat{\theta}_n}\).

Consistency: Under regularity conditions (continuous \(F_\theta\), consistent \(\hat{\theta}_n\), smooth \(T\)), the parametric bootstrap is consistent:

\[\sup_t \left| G^*_{\hat{\theta}_n}(t) - G_{\theta_0}(t) \right| \xrightarrow{P} 0\]

where \(G_{\theta_0}\) is the true sampling distribution of \(\hat{\theta}_n\).

Higher-order accuracy: For studentized statistics \(Z^* = \sqrt{n}(\hat{\theta}^* - \hat{\theta}_n)/\hat{\sigma}^*\), studentization combined with bootstrap calibration can achieve second-order accuracy under standard regularity conditions (smooth parametric models, asymptotically pivotal statistics):

\[\sup_t \left| P^*(Z^* \leq t) - P(Z \leq t) \right| = O_P(n^{-1})\]

compared to \(O_P(n^{-1/2})\) for unstudentized percentile methods, which are typically first-order accurate. This improved convergence rate is a major advantage for confidence interval construction, though it requires proper studentization with per-replicate SE estimates.

Why Use Parametric Bootstrap?

Efficiency when the model is correct: If \(F_{\hat{\theta}_n}\) is close to \(F_{\theta_0}\), the parametric bootstrap produces tighter confidence intervals than the nonparametric bootstrap.

Better finite-sample behavior: With small samples (\(n < 50\)), the empirical distribution \(\hat{F}_n\) is a crude approximation to \(F\). A well-chosen parametric model can capture population features (tails, smoothness) that \(\hat{F}_n\) misses.

Natural for model-based inference: When the analysis already assumes a parametric model (regression, GLM, survival analysis), parametric bootstrap maintains consistency between model fitting and uncertainty quantification.

Handles boundary constraints: Parametric models naturally respect parameter constraints (e.g., variances are positive, probabilities are in \([0,1]\)), while nonparametric bootstrap may produce invalid resamples.

Parametric Bootstrap vs. Parametric Monte Carlo

Students sometimes ask: “Is parametric bootstrap just parametric Monte Carlo?” The distinction is subtle but important:

Parametric Monte Carlo simulates from a known parametric model with \(\theta\) treated as given or hypothesized (e.g., simulating under the null hypothesis).
Parametric bootstrap simulates from an estimated model \(F_{\hat{\theta}}\), approximating the sampling distribution under the true \(F_{\theta_0}\).

This distinction explains why model misspecification is fatal for parametric bootstrap: we’re using \(\hat{\theta}\) as a proxy for \(\theta_0\), and if the model family is wrong, this proxy breaks down.

Efficiency comparison between parametric and nonparametric bootstrap — Fig. 132 **Figure 4.4.3**: Efficiency comparison when the parametric model is correct. (a) CI widths across sample sizes—parametric produces narrower intervals, especially for small n. (b) Coverage rates near nominal 95%. (c) Efficiency ratio shows parametric gains are largest for small samples.

The Algorithm: General Form

Algorithm: Parametric Bootstrap (General)

Input: Data X₁, ..., Xₙ; parametric family {F_θ}; estimator θ̂; statistic T; replicates B
Output: Bootstrap distribution {T*₁, ..., T*_B}

1. Fit model: Compute θ̂ₙ from data (e.g., MLE)
2. For b = 1, ..., B:
   a. Generate X*₁, ..., X*ₙ iid ~ F_{θ̂ₙ}
   b. Compute θ̂*_b from bootstrap sample
   c. Compute T*_b = T(θ̂*_b) or any functional of interest
3. Return {T*₁, ..., T*_B}

Computational complexity: \(O(B \cdot n \cdot c)\) where \(c\) is the cost of computing \(T\). For regression with \(p\) predictors, \(c = O(np^2)\) for OLS.

Python Implementation

import numpy as np
from scipy import stats

def parametric_bootstrap(data, fit_model, generate_data, statistic,
                         B=2000, seed=None):
    """
    General parametric bootstrap.

    Parameters
    ----------
    data : array_like
        Original data.
    fit_model : callable
        Function that takes data and returns fitted parameters.
    generate_data : callable
        Function that takes (params, n, rng) and returns simulated data.
    statistic : callable
        Function that computes the statistic from data.
    B : int
        Number of bootstrap replicates.
    seed : int, optional
        Random seed for reproducibility.

    Returns
    -------
    theta_star : ndarray
        Bootstrap distribution of the statistic.
    theta_hat : float or ndarray
        Original estimate.
    params_hat : tuple
        Fitted model parameters.
    """
    rng = np.random.default_rng(seed)
    data = np.asarray(data)
    n = len(data)

    # Step 1: Fit model to original data
    params_hat = fit_model(data)
    theta_hat = statistic(data)

    # Pre-allocate
    theta_hat_arr = np.asarray(theta_hat)
    if theta_hat_arr.ndim == 0:
        theta_star = np.empty(B)
    else:
        theta_star = np.empty((B,) + theta_hat_arr.shape)

    # Step 2: Bootstrap loop
    for b in range(B):
        # Generate new data from fitted model
        data_star = generate_data(params_hat, n, rng)
        theta_star[b] = statistic(data_star)

    return theta_star, theta_hat, params_hat

Example 💡 Parametric Bootstrap for Normal Mean

Given: \(X_1, \ldots, X_n \stackrel{iid}{\sim} N(\mu, \sigma^2)\) with unknown \(\mu, \sigma^2\).

Find: Bootstrap SE and 95% CI for \(\hat{\mu} = \bar{X}\).

Implementation:

import numpy as np
from scipy import stats

rng = np.random.default_rng(42)

# Generate sample data
n, mu_true, sigma_true = 30, 10.0, 2.5
data = rng.normal(mu_true, sigma_true, size=n)

# Fit normal model (plug-in estimates)
mu_hat = data.mean()                # MLE for μ
sigma_hat = data.std(ddof=1)        # Sample SD (unbiased); MLE would use ddof=0

# Parametric bootstrap
B = 5000
rng = np.random.default_rng(123)
mu_star = np.empty(B)

for b in range(B):
    # Generate from fitted N(μ̂, σ̂²)
    data_star = rng.normal(mu_hat, sigma_hat, size=n)
    mu_star[b] = data_star.mean()

# Results
se_boot = mu_star.std(ddof=1)
ci_percentile = np.quantile(mu_star, [0.025, 0.975])

# Compare to analytical
se_analytical = sigma_hat / np.sqrt(n)
ci_t = stats.t.interval(0.95, df=n-1, loc=mu_hat, scale=se_analytical)

print(f"Sample mean: {mu_hat:.4f}")
print(f"Bootstrap SE: {se_boot:.4f}, Analytical SE: {se_analytical:.4f}")
print(f"Percentile CI: [{ci_percentile[0]:.4f}, {ci_percentile[1]:.4f}]")
print(f"t-interval CI: [{ci_t[0]:.4f}, {ci_t[1]:.4f}]")

Result: For correctly specified normal models, the parametric bootstrap matches analytical results closely, validating the approach.

Parametric bootstrap algorithm walkthrough with exponential distribution example — Fig. 133 **Figure 4.4.2**: The parametric bootstrap algorithm illustrated with exponential data. (a) Fit the model to data. (b) Simulate bootstrap samples from the fitted distribution. (c) Compute statistics on each sample. (d) Build the bootstrap distribution for inference.

Location-Scale Families

Many common distributions belong to location-scale families, where \(X = \mu + \sigma Z\) for standardized \(Z\). These include normal, Student-t, logistic, and Laplace distributions.

General Algorithm

Algorithm: Parametric Bootstrap for Location-Scale Families

Input: Data X₁, ..., Xₙ; location-scale family F; replicates B
Output: Bootstrap distribution of (μ̂*, σ̂*)

1. Estimate location μ̂ and scale σ̂ (MLE or robust estimators)
2. [Optional] Fit shape parameter if needed (e.g., df for t-distribution)
3. For b = 1, ..., B:
   a. Generate Z*₁, ..., Z*ₙ iid from the standard form of F
   b. Transform: X*ᵢ = μ̂ + σ̂ · Z*ᵢ
   c. Re-estimate (μ̂*, σ̂*) from X*
4. Return bootstrap estimates

Note: This is a pure parametric bootstrap—we simulate fresh standardized values from the assumed distribution, not from the empirical residuals. For a semi-parametric approach that resamples standardized residuals \(Z_i = (X_i - \hat{\mu})/\hat{\sigma}\), see the residual bootstrap in Section 4.3 The Nonparametric Bootstrap.

Estimator Choices

The parametric bootstrap requires consistent estimators. Common choices:

Normal distribution:

Location: \(\hat{\mu} = \bar{X}\) (MLE)
Scale: \(\hat{\sigma} = s = \sqrt{\frac{1}{n-1}\sum(X_i - \bar{X})^2}\) (unbiased) or \(\hat{\sigma}_{MLE} = \sqrt{\frac{1}{n}\sum(X_i - \bar{X})^2}\)

Student-t distribution:

Location: Sample mean or median
Scale: \(\hat{\sigma} = s \cdot \sqrt{(\nu - 2)/\nu}\) for known \(\nu\)
Degrees of freedom: Estimate via MLE or fix based on prior knowledge

Exponential/Gamma:

Rate: \(\hat{\lambda} = 1/\bar{X}\) (MLE for exponential)
Shape and rate: Method of moments or MLE for gamma

import numpy as np
from scipy import stats
from scipy.optimize import minimize_scalar

def bootstrap_location_scale(data, distribution='normal', B=2000, seed=None):
    """
    Parametric bootstrap for location-scale families.

    Parameters
    ----------
    data : array_like
        Sample data.
    distribution : str
        'normal', 't', 'laplace', or 'logistic'.
    B : int
        Number of bootstrap replicates.
    seed : int, optional
        Random seed.

    Returns
    -------
    dict with bootstrap results.
    """
    rng = np.random.default_rng(seed)
    data = np.asarray(data)
    n = len(data)

    # Fit distribution
    if distribution == 'normal':
        mu_hat, sigma_hat = data.mean(), data.std(ddof=1)
        rvs = lambda size: rng.normal(mu_hat, sigma_hat, size)
    elif distribution == 't':
        # Fit t-distribution via MLE
        df, loc, scale = stats.t.fit(data)
        mu_hat, sigma_hat = loc, scale
        rvs = lambda size: stats.t.rvs(df, loc=loc, scale=scale,
                                       size=size, random_state=rng)
    elif distribution == 'laplace':
        mu_hat = np.median(data)  # MLE for location
        sigma_hat = np.mean(np.abs(data - mu_hat))  # MLE for scale
        rvs = lambda size: rng.laplace(mu_hat, sigma_hat, size)
    elif distribution == 'logistic':
        mu_hat, sigma_hat = data.mean(), data.std() * np.sqrt(3) / np.pi
        rvs = lambda size: rng.logistic(mu_hat, sigma_hat, size)
    else:
        raise ValueError(f"Unknown distribution: {distribution}")

    # Bootstrap
    mu_star = np.empty(B)
    sigma_star = np.empty(B)

    for b in range(B):
        data_star = rvs(n)
        mu_star[b] = data_star.mean()
        sigma_star[b] = data_star.std(ddof=1)

    return {
        'mu_hat': mu_hat,
        'sigma_hat': sigma_hat,
        'mu_star': mu_star,
        'sigma_star': sigma_star,
        'se_mu': mu_star.std(ddof=1),
        'se_sigma': sigma_star.std(ddof=1),
        'ci_mu': np.quantile(mu_star, [0.025, 0.975]),
        'ci_sigma': np.quantile(sigma_star, [0.025, 0.975])
    }

Parametric Bootstrap for Regression

The parametric bootstrap for regression extends the residual bootstrap by assuming a distributional form for the errors rather than resampling observed residuals.

Fixed-X Parametric Bootstrap

For the linear model \(Y = X\beta + \varepsilon\) with \(\varepsilon \sim N(0, \sigma^2 I_n)\):

Algorithm: Parametric Bootstrap for OLS (Normal Errors)

Input: Design X (n × p), response Y, replicates B
Output: Bootstrap distribution of β̂*

1. Fit OLS: β̂ = (X'X)⁻¹X'Y
2. Compute residuals: ê = Y - Xβ̂
3. Estimate error variance: σ̂² = ||ê||² / (n - p)
4. For b = 1, ..., B:
   a. Generate ε*₁, ..., ε*ₙ iid ~ N(0, σ̂²)
   b. Form Y* = Xβ̂ + ε*
   c. Refit: β̂*_b = (X'X)⁻¹X'Y*
5. Return {β̂*₁, ..., β̂*_B}

Key insight: Unlike the residual bootstrap (which resamples \(\hat{e}_i\)), the parametric bootstrap draws fresh errors from \(N(0, \hat{\sigma}^2)\). This:

Produces smooth bootstrap distributions
Respects the assumed error structure
Can be more efficient when normality holds
Fails if errors are non-normal

Scope: Conditional Inference

This fixed-X parametric bootstrap targets conditional inference given X. If your predictors are random and you want unconditional inference on \(\beta\), you need a different resampling scheme (pairs bootstrap, or a generative model for X).

What is held fixed vs. resampled:

Component	Fixed-X Parametric	Pairs Bootstrap
Design matrix X	Fixed (same X in all replicates)	Resampled (rows of X change)
Error distribution	\(N(0, \hat{\sigma}^2)\) (assumed)	Empirical (from residuals)
Fitted means \(X\hat{\beta}\)	Fixed	Recomputed each replicate
Inference target	Conditional on X	Unconditional

Python Implementation

import numpy as np

def parametric_bootstrap_regression(X, y, B=2000, seed=None):
    """
    Parametric bootstrap for OLS regression with normal errors.

    Parameters
    ----------
    X : ndarray, shape (n, p)
        Design matrix (should include intercept column if desired).
    y : ndarray, shape (n,)
        Response vector.
    B : int
        Number of bootstrap replicates.
    seed : int, optional
        Random seed.

    Returns
    -------
    beta_star : ndarray, shape (B, p)
        Bootstrap distribution of coefficients.
    beta_hat : ndarray, shape (p,)
        OLS estimates.
    sigma_hat : float
        Estimated error standard deviation.
    """
    rng = np.random.default_rng(seed)
    X = np.asarray(X)
    y = np.asarray(y)
    n, p = X.shape

    # Fit OLS
    XtX_inv = np.linalg.inv(X.T @ X)
    beta_hat = XtX_inv @ X.T @ y
    y_hat = X @ beta_hat
    residuals = y - y_hat

    # Estimate error variance
    sigma_hat = np.sqrt(np.sum(residuals**2) / (n - p))

    # Parametric bootstrap
    beta_star = np.empty((B, p))

    for b in range(B):
        # Generate errors from N(0, σ̂²)
        eps_star = rng.normal(0, sigma_hat, size=n)
        y_star = y_hat + eps_star
        beta_star[b] = XtX_inv @ X.T @ y_star

    return beta_star, beta_hat, sigma_hat

Example 💡 Comparing Parametric and Nonparametric Bootstrap for Regression

Setup: Simulate data with normal errors and compare bootstrap approaches.

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)

# Generate data with normal errors
n, p = 50, 2
X = np.column_stack([np.ones(n), rng.uniform(-2, 2, n)])
beta_true = np.array([1.0, 2.0])
sigma_true = 0.5
y = X @ beta_true + rng.normal(0, sigma_true, n)

B = 3000
rng = np.random.default_rng(123)

# Fit OLS
XtX_inv = np.linalg.inv(X.T @ X)
beta_hat = XtX_inv @ X.T @ y
y_hat = X @ beta_hat
residuals = y - y_hat
sigma_hat = np.sqrt(np.sum(residuals**2) / (n - p))

# Parametric bootstrap
beta_param = np.empty((B, p))
for b in range(B):
    eps_star = rng.normal(0, sigma_hat, size=n)
    y_star = y_hat + eps_star
    beta_param[b] = XtX_inv @ X.T @ y_star

# Nonparametric (residual) bootstrap
resid_centered = residuals - residuals.mean()
beta_nonparam = np.empty((B, p))
for b in range(B):
    idx = rng.integers(0, n, size=n)
    e_star = resid_centered[idx]
    y_star = y_hat + e_star
    beta_nonparam[b] = XtX_inv @ X.T @ y_star

# Compare
print("SE for β₁ (slope):")
print(f"  Parametric:    {beta_param[:, 1].std():.4f}")
print(f"  Nonparametric: {beta_nonparam[:, 1].std():.4f}")
print(f"  Analytical:    {sigma_hat * np.sqrt(XtX_inv[1,1]):.4f}")

Result: When errors are truly normal, all three approaches give similar SEs. The parametric bootstrap slightly outperforms when the model is correct.

Generalized Linear Models

For GLMs, the parametric bootstrap simulates from the fitted conditional distribution of \(Y | X\).

Algorithm: Parametric Bootstrap for GLM

Input: Design X, response Y, family (e.g., binomial, Poisson), replicates B
Output: Bootstrap distribution of β̂*

1. Fit GLM: β̂, compute μ̂ᵢ = g⁻¹(xᵢ'β̂) where g is the link function
2. For b = 1, ..., B:
   a. For each i, generate Y*ᵢ ~ Family(μ̂ᵢ, φ̂)
      - Binomial: Y*ᵢ ~ Binomial(nᵢ, μ̂ᵢ)
      - Poisson: Y*ᵢ ~ Poisson(μ̂ᵢ)
      - Gamma: Y*ᵢ ~ Gamma(α̂, μ̂ᵢ/α̂)
   b. Refit GLM on (X, Y*) to get β̂*_b
3. Return {β̂*₁, ..., β̂*_B}

import numpy as np
import statsmodels.api as sm

def parametric_bootstrap_glm(X, y, family, B=2000, seed=None, n_trials=None):
    """
    Parametric bootstrap for GLMs.

    Parameters
    ----------
    X : ndarray
        Design matrix (include intercept).
    y : ndarray
        Response.
    family : statsmodels family
        GLM family (e.g., sm.families.Poisson()).
    B : int
        Number of replicates.
    seed : int
        Random seed.
    n_trials : ndarray, optional
        Number of trials for Binomial family. Required if y contains
        counts > 1. If None and family is Binomial, assumes Bernoulli (n=1).

    Returns
    -------
    beta_star : ndarray, shape (B, p)
        Bootstrap coefficients.
    model : fitted GLM
        Original fitted model.
    """
    rng = np.random.default_rng(seed)

    # Fit original model
    model = sm.GLM(y, X, family=family).fit()
    beta_hat = model.params
    mu_hat = model.mu  # Fitted means

    p = len(beta_hat)
    beta_star = np.empty((B, p))

    # Check family type using class name (more robust than isinstance)
    family_name = family.__class__.__name__

    for b in range(B):
        # Generate from fitted distribution
        if family_name == 'Poisson':
            y_star = rng.poisson(mu_hat)

        elif family_name == 'Binomial':
            if n_trials is None:
                # Check if data looks like Bernoulli
                if np.all((y == 0) | (y == 1)):
                    n_trials = np.ones_like(y, dtype=int)
                else:
                    raise ValueError(
                        "n_trials required for Binomial with count data. "
                        "If y is binary (0/1), this is inferred automatically."
                    )
            y_star = rng.binomial(n_trials.astype(int), mu_hat)

        elif family_name == 'Gamma':
            # Gamma with shape from dispersion
            phi = model.scale
            shape = 1 / phi
            y_star = rng.gamma(shape, mu_hat / shape)

        else:
            raise NotImplementedError(f"Family {family_name} not implemented")

        # Refit
        try:
            model_star = sm.GLM(y_star, X, family=family).fit(disp=0)
            beta_star[b] = model_star.params
        except Exception:
            beta_star[b] = np.nan

    return beta_star, model

Parametric bootstrap for regression with Q-Q diagnostics and SE comparison — Fig. 134 **Figure 4.4.5**: Parametric bootstrap for regression. (a) Data and fitted model. (b) Q-Q plot to check normality assumption. (d-e) Bootstrap distributions for intercept and slope. (f) SE comparison shows agreement between analytical, parametric, and residual bootstrap when normality holds.

Confidence Intervals

Percentile Intervals

The simplest parametric bootstrap CI uses quantiles of the bootstrap distribution:

\[\text{CI}_{1-\alpha}^{\text{perc}} = \left[Q_{\alpha/2}(\hat{\theta}^*), Q_{1-\alpha/2}(\hat{\theta}^*)\right]\]

This is identical to nonparametric bootstrap percentile intervals in form, but the bootstrap distribution comes from parametric simulation.

Bootstrap-t (Studentized) Intervals

For second-order accuracy, use the studentized bootstrap:

For each bootstrap replicate, compute \(Z^*_b = (\hat{\theta}^*_b - \hat{\theta})/\hat{\text{SE}}^*_b\)
Find quantiles \(q_{\alpha/2}^*\) and \(q_{1-\alpha/2}^*\) of \(\{Z^*_b\}\)
CI: \([\hat{\theta} - q_{1-\alpha/2}^* \cdot \widehat{\text{SE}}, \hat{\theta} - q_{\alpha/2}^* \cdot \widehat{\text{SE}}]\)

\[\text{CI}_{1-\alpha}^{\text{boot-t}} = \left[\hat{\theta} - q^*_{1-\alpha/2} \cdot \widehat{\text{SE}}, \hat{\theta} - q^*_{\alpha/2} \cdot \widehat{\text{SE}}\right]\]

Advantage: Achieves \(O(n^{-1})\) coverage error versus \(O(n^{-1/2})\) for percentile intervals.

Disadvantage: Requires estimating SE within each bootstrap replicate (nested bootstrap or analytical formula).

BCa Intervals with Parametric Bootstrap

The BCa method (Bias-Corrected and accelerated) can be applied to parametric bootstrap as well — the method itself is developed in full in the optional Section 4.7:

\[z_0 = \Phi^{-1}\left(\frac{\#\{\hat{\theta}^*_b < \hat{\theta}\}}{B}\right), \quad \hat{a} = \frac{\sum_{i=1}^n (\bar{\theta}_{(\cdot)} - \hat{\theta}_{(-i)})^3}{6\left[\sum_{i=1}^n (\bar{\theta}_{(\cdot)} - \hat{\theta}_{(-i)})^2\right]^{3/2}}\]

The adjusted quantiles are:

\[\alpha_1 = \Phi\left(z_0 + \frac{z_0 + z_{\alpha/2}}{1 - \hat{a}(z_0 + z_{\alpha/2})}\right), \quad \alpha_2 = \Phi\left(z_0 + \frac{z_0 + z_{1-\alpha/2}}{1 - \hat{a}(z_0 + z_{1-\alpha/2})}\right)\]

BCa intervals are transformation-invariant and range-preserving, making them excellent for parameters with natural bounds.

Python Implementation of CI Methods

import numpy as np
from scipy import stats

def parametric_bootstrap_ci(theta_star, theta_hat, alpha=0.05,
                             method='percentile', se_hat=None,
                             se_star=None, theta_jack=None):
    """
    Confidence intervals from parametric bootstrap.

    Parameters
    ----------
    theta_star : ndarray
        Bootstrap replicates.
    theta_hat : float
        Original estimate.
    alpha : float
        Significance level.
    method : str
        'percentile', 'basic', 'normal', 'studentized', or 'bca'.
    se_hat : float, optional
        SE estimate for original data (needed for 'normal', 'studentized').
    se_star : ndarray, optional
        Per-replicate SE estimates (needed for true 'studentized').
        If None with 'studentized', falls back to pseudo-studentized.
    theta_jack : ndarray, optional
        Jackknife estimates (needed for 'bca').

    Returns
    -------
    ci : tuple
        (lower, upper) confidence limits.
    """
    B = len(theta_star)

    if method == 'percentile':
        return (np.quantile(theta_star, alpha/2),
                np.quantile(theta_star, 1 - alpha/2))

    elif method == 'basic':
        q_lo = np.quantile(theta_star, alpha/2)
        q_hi = np.quantile(theta_star, 1 - alpha/2)
        return (2*theta_hat - q_hi, 2*theta_hat - q_lo)

    elif method == 'normal':
        if se_hat is None:
            se_hat = theta_star.std(ddof=1)
        z = stats.norm.ppf(1 - alpha/2)
        return (theta_hat - z*se_hat, theta_hat + z*se_hat)

    elif method == 'studentized':
        # True studentized bootstrap requires SE for each replicate
        if se_hat is None:
            raise ValueError("se_hat required for studentized method")

        if se_star is not None:
            # Proper studentized bootstrap with per-replicate SEs
            t_star = (theta_star - theta_hat) / se_star
        else:
            # Pseudo-studentized: uses single SE (does NOT achieve
            # second-order accuracy; use only as approximation)
            import warnings
            warnings.warn("se_star not provided; using pseudo-studentized "
                        "method which does not achieve second-order accuracy")
            t_star = (theta_star - theta_hat) / se_hat

        q_lo = np.quantile(t_star, alpha/2)
        q_hi = np.quantile(t_star, 1 - alpha/2)
        return (theta_hat - q_hi*se_hat, theta_hat - q_lo*se_hat)

    elif method == 'bca':
        if theta_jack is None:
            raise ValueError("Jackknife estimates required for BCa")

        # Bias correction
        prop_below = np.mean(theta_star < theta_hat)
        z0 = stats.norm.ppf(prop_below) if 0 < prop_below < 1 else 0

        # Acceleration
        theta_bar = theta_jack.mean()
        num = np.sum((theta_bar - theta_jack)**3)
        denom = 6 * np.sum((theta_bar - theta_jack)**2)**1.5
        a_hat = num / denom if denom != 0 else 0

        # Adjusted quantiles
        z_alpha = stats.norm.ppf(alpha/2)
        z_1alpha = stats.norm.ppf(1 - alpha/2)

        def adj_quantile(z):
            return stats.norm.cdf(z0 + (z0 + z)/(1 - a_hat*(z0 + z)))

        alpha1 = adj_quantile(z_alpha)
        alpha2 = adj_quantile(z_1alpha)

        return (np.quantile(theta_star, alpha1),
                np.quantile(theta_star, alpha2))

    else:
        raise ValueError(f"Unknown method: {method}")

Model Checking and Validation

The parametric bootstrap’s validity depends critically on model correctness. Before relying on parametric bootstrap, verify assumptions.

Diagnostic Checks

Graphical diagnostics:
- Q-Q plot of residuals against assumed distribution
- Residual plots for heteroscedasticity
- Scale-location plot for variance stability
Formal tests (use with caution—low power for small \(n\)):
- Shapiro-Wilk for normality
- Breusch-Pagan for heteroscedasticity
- Anderson-Darling for distributional fit
Sensitivity analysis:
- Compare parametric and nonparametric bootstrap CIs
- If they differ substantially, investigate model assumptions

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

def check_parametric_assumptions(residuals, distribution='normal'):
    """
    Diagnostic plots for parametric bootstrap assumptions.

    Parameters
    ----------
    residuals : array_like
        Model residuals.
    distribution : str
        Assumed distribution ('normal', 't', etc.).
    """
    residuals = np.asarray(residuals)
    n = len(residuals)

    fig, axes = plt.subplots(2, 2, figsize=(10, 8))

    # Q-Q plot
    ax = axes[0, 0]
    if distribution == 'normal':
        stats.probplot(residuals, dist="norm", plot=ax)
    elif distribution == 't':
        # Fit df, then Q-Q
        df = stats.t.fit(residuals)[0]
        stats.probplot(residuals, dist=stats.t(df), plot=ax)
    ax.set_title('Q-Q Plot')

    # Histogram with fitted distribution
    ax = axes[0, 1]
    ax.hist(residuals, bins='auto', density=True, alpha=0.7,
            edgecolor='black')
    x = np.linspace(residuals.min(), residuals.max(), 100)
    if distribution == 'normal':
        mu, sigma = residuals.mean(), residuals.std()
        ax.plot(x, stats.norm.pdf(x, mu, sigma), 'r-', lw=2,
                label='Fitted Normal')
    ax.legend()
    ax.set_title('Histogram vs Fitted Distribution')

    # Residuals vs fitted (check for patterns)
    ax = axes[1, 0]
    ax.scatter(np.arange(n), residuals, alpha=0.5)
    ax.axhline(0, color='red', linestyle='--')
    ax.set_xlabel('Index')
    ax.set_ylabel('Residuals')
    ax.set_title('Residuals vs Index')

    # Scale-location plot
    ax = axes[1, 1]
    std_resid = residuals / residuals.std()
    ax.scatter(np.arange(n), np.sqrt(np.abs(std_resid)), alpha=0.5)
    ax.set_xlabel('Index')
    ax.set_ylabel('√|Standardized Residuals|')
    ax.set_title('Scale-Location Plot')

    plt.tight_layout()

    # Formal tests
    print("Diagnostic Tests:")
    if distribution == 'normal':
        stat, p = stats.shapiro(residuals)
        print(f"  Shapiro-Wilk: W={stat:.4f}, p={p:.4f}")

    # Runs test for independence
    median = np.median(residuals)
    signs = residuals > median
    runs = 1 + np.sum(signs[1:] != signs[:-1])
    n_pos, n_neg = signs.sum(), (~signs).sum()
    exp_runs = 1 + 2*n_pos*n_neg / n
    var_runs = 2*n_pos*n_neg*(2*n_pos*n_neg - n) / (n**2 * (n-1))
    z_runs = (runs - exp_runs) / np.sqrt(var_runs)
    print(f"  Runs test: z={z_runs:.4f}, p={2*(1-stats.norm.cdf(abs(z_runs))):.4f}")

    return fig

Model validation diagnostics including Q-Q plots and SE comparison — Fig. 135 **Figure 4.4.7**: Model validation diagnostics. Left column shows a well-behaved case (normal data) where parametric assumptions hold. Right column shows a problematic case (bimodal data) where normality fails. When methods disagree (panel e), investigate model assumptions.

Common Pitfall ⚠️

Trusting the wrong model: If you use parametric bootstrap with a misspecified model (e.g., assuming normality when errors are heavy-tailed), confidence intervals may have severely incorrect coverage. Always validate assumptions before relying on parametric bootstrap.

Course heuristic: If parametric and nonparametric bootstrap CIs differ by more than 20% in width, this is a flag to investigate model assumptions carefully. This threshold is not a formal statistical rule but a practical signal that something may warrant attention.

When Parametric Bootstrap Fails

Model Misspecification

The most serious parametric-bootstrap failure mode is model misspecification. If the true distribution \(F\) is meaningfully outside the assumed family \(\{F_\theta\}\), then simulating from \(F_{\hat\theta}\) does not reproduce the sampling behavior of the statistic under the real data-generating process. In that case, the parametric bootstrap can deliver biased standard errors and confidence intervals with severe undercoverage.

A particularly fragile setting is inference for a tail functional, such as \(p = P(X > c)\), because \(p\) depends on the distribution’s extreme-right behavior. In the example below, the true distribution is a mixture (bimodal). Fitting a single Normal forces the model to “average” the two modes, which misrepresents the right tail that determines \(p\).

Parametric bootstrap failure under model misspecification for a tail probability p = P(X > c) — Fig. 136 **Figure 4.4.4**: Model misspecification, parametric bootstrap can be confidently wrong. (a) Data are generated from a mixture distribution, but a single Normal model is fitted, producing a tail mismatch beyond the threshold \(c=6\). (b) Bootstrap distributions for \(\hat{p}^* = P(X^* > c)\) differ sharply: the parametric bootstrap (wrong model) concentrates on smaller values, while the nonparametric bootstrap remains anchored to the empirical tail behavior. (c) In repeated-sampling experiments, the 95% parametric-bootstrap CI for \(p\) can have drastic undercoverage under misspecification, whereas the nonparametric bootstrap is typically much closer to nominal.

Boundary Parameters

Boundary and near boundary settings are nonregular: the estimator’s sampling distribution can be highly asymmetric, and the usual bootstrap logic can break. In these cases, the parametric bootstrap can fail, and the standard nonparametric bootstrap can fail as well when the statistic is constrained by the sample support.

Uniform[0, (theta)] (endpoint / maximum): If \(\hat{\theta} = X_{(n)}\) (the sample maximum), then a parametric bootstrap that simulates \(X^* \sim \mathrm{Unif}(0,\hat{\theta})\) produces \(\hat{\theta}^*=\max(X^*) \le \hat{\theta}\) always. This truncation makes it impossible to represent upward uncertainty in \(\theta\). The same structural problem occurs for the standard nonparametric bootstrap, since resampling from \(\hat F_n\) cannot generate observations exceeding \(X_{(n)}\).
Variance near 0 (degenerate scale estimate): If the sample variance is extremely small, a parametric bootstrap from \(N(\hat{\mu},\hat{\sigma}^2)\) with tiny \(\hat{\sigma}\) yields nearly degenerate bootstrap samples and misleadingly small standard errors.

Remedies (choose based on context):

Exact finite-sample theory when available: for Uniform endpoints, order-statistic results yield exact tail probabilities and CIs.
Bias-corrected or alternative estimators: for Uniform[0, (theta)], \(\hat{\theta}_{bc}=\frac{n+1}{n}X_{(n)}\) removes the downward bias in \(X_{(n)}\).
Subsampling / m-out-of-n bootstrap: resample \(m<n\) observations (with or without replacement) to obtain a distributional approximation that is often more stable in nonregular settings.
Reparameterization: when possible, transform to an unconstrained scale to avoid boundary artifacts.

Boundary parameter failure showing bootstrap truncation at the sample maximum — Fig. 137 **Figure 4.4.6**: Boundary parameter failure for Uniform[0, \(\theta\)]. (a) The MLE \(\hat{\theta}=X_{(n)}\) typically underestimates the true endpoint. (b) Parametric bootstrap from \(\mathrm{Unif}(0,\hat{\theta})\) is truncated at \(\hat{\theta}\) (and the standard nonparametric bootstrap is similarly truncated), so upward uncertainty cannot be represented. (c) The resulting bootstrap standard error and percentile CI upper bound are biased downward relative to the true sampling behavior.

Heavy-Tailed Distributions

Heavy tails can break bootstrap inference, but the issue depends on both the tail behavior of \(F\) and the statistic \(T_n\).

If \(F\) has infinite variance (e.g., stable laws with \(\alpha < 2\)), then the sample mean and other non-robust, variance-dependent functionals often have nonstandard asymptotics (no CLT, stable limits, or extremely slow convergence). In these regimes, the usual nonparametric bootstrap may give a poor or even inconsistent approximation to the sampling distribution of \(T_n\).
In the extreme case of no finite mean (e.g., Cauchy), the sample mean does not stabilize in the usual sense; it does not concentrate around a constant. A bootstrap distribution built from \(\hat F_n\) cannot recover a “nice” sampling law for \(\bar X\) because the target itself is nonstandard.

Robust statistics are typically safer under heavy tails. Medians, trimmed means, and many M-estimators can retain well-behaved limiting distributions (often asymptotically normal under mild conditions) and are therefore more compatible with bootstrap inference.

Remedy: Prefer robust estimators (median, trimmed mean, appropriate M-estimators), and choose a bootstrap that matches the estimator and tail regime. When tails are very heavy, consider diagnostics (tail plots, extreme values) and sensitivity checks across multiple robust choices.

Parametric vs. Nonparametric: A Decision Framework

Choosing between parametric and nonparametric bootstrap involves balancing efficiency against robustness.

Table 35 Parametric vs. Nonparametric Bootstrap
Criterion	Parametric Bootstrap	Nonparametric Bootstrap
Primary assumption	Data follow a specified family \(\{F_\theta\}\) (after fitting \(\hat\theta\))	Uses \(\hat F_n\); fewer distributional assumptions
When it shines	Often more efficient when the model is credible, especially for small \(n\)	Generally more robust to distributional misspecification
Small samples (\(n \lesssim 30\))	Can give tighter, more stable inference if the model is correct	Can be noisier (discrete resampling, higher Monte Carlo variability), but often still reasonable
Model misspecification	Can fail badly (bias, undercoverage, misleading SEs)	Typically more robust, but not universally “always valid”
Complex statistics / functionals	Often straightforward (simulate from fitted model, recompute statistic)	Often straightforward (resample data, recompute statistic)
Support constraints / boundaries	Can fail for boundary MLEs (e.g., \(\hat\theta = X_{(n)}\)); may require special remedies	Can also fail or be conservative for extreme order statistics; may require m-out-of-n / subsampling
Computational cost	Usually \(O(Bn)\) plus model fitting if refit each replicate	Usually \(O(Bn)\) plus any refitting inside each replicate

Decision Flowchart

Do you have strong prior knowledge for a parametric family?
- No → Start with nonparametric.
- Yes → Continue.
Do basic diagnostics support the model (shape, tails, support)?
- Clear departures (skewness, multimodality, heavy tails, truncation) → Prefer nonparametric or revise the model.
- Diagnostics look reasonable → Continue.
Is the sample size small (:math:`n lesssim 30`)?
- Yes and model credible → Parametric often improves efficiency and stability.
- Yes and model uncertain → Prefer nonparametric, and treat parametric as a sensitivity check only.
Is the target sensitive to tails or support (extremes, thresholds, quantiles)?
- Yes → Parametric assumptions matter more; validate aggressively and consider robust or tail-appropriate models.
- No → Either method may work; compare as a robustness check.
When in doubt: Run both. If results differ materially, diagnose why (model fit, tails, boundaries, dependence).

Decision framework flowchart for choosing between parametric and nonparametric bootstrap — Fig. 138 **Figure 4.4.8**: Decision framework for choosing between parametric and nonparametric bootstrap. Parametric bootstrap trades robustness for efficiency. A practical default is to start with nonparametric and use parametric only when the model is credible and diagnostics support it.

Practical Recommendation

Default: Start with nonparametric for robustness, then compare with a parametric bootstrap if a credible model exists.

Prefer parametric bootstrap when:

You have a well-justified parametric family for \(F\)
Diagnostics do not show clear shape or tail conflicts
\(n\) is small and efficiency matters

Use special remedies near boundaries or extremes (for either method):

Exact theory when available (order statistics, known pivots)
Bias-corrected estimators (e.g., \(\hat\theta_{bc} = \frac{n+1}{n} X_{(n)}\))
m-out-of-n bootstrap or subsampling for boundary or extreme-value functionals

Practical Considerations

Computational Efficiency

Vectorization: Pre-generate all \(B \times n\) random values at once, then reshape

import numpy as np

rng = np.random.default_rng(42)
B, n, sigma = 1000, 50, 1.0  # replicates, sample size, error scale

# Instead of loop:
# for b in range(B):
#     eps = rng.normal(0, sigma, n)

# Vectorized:
eps_all = rng.normal(0, sigma, size=(B, n))
# Then process all B samples simultaneously

Parallel computation: Bootstrap replicates are independent; parallelize across cores
Analytical shortcuts: For some parametric families, bootstrap distributions have closed forms (e.g., normal mean bootstrap is exactly \(N(\hat{\mu}, \hat{\sigma}^2/n)\))

Choosing B

Recommendations mirror nonparametric bootstrap:

SE estimation: \(B = 500\) to \(2{,}000\)
Confidence intervals: \(B = 2{,}000\) to \(10{,}000\)
P-values: \(B \geq 10{,}000\) for \(\alpha = 0.05\)

Monte Carlo error decreases as \(O(1/\sqrt{B})\), so doubling precision requires quadrupling \(B\).

Reporting Standards

A parametric bootstrap analysis should report:

The assumed model: Distribution family and any fixed parameters
Estimation method: MLE, method of moments, etc.
Model diagnostics: Evidence supporting distributional assumptions
Bootstrap details: Number of replicates \(B\), random seed
CI method: Percentile, bootstrap-t, BCa
Comparison (recommended): Results from nonparametric bootstrap as sensitivity check

Bringing It All Together

The parametric bootstrap is a powerful tool when model assumptions are credible. It offers improved efficiency and finite-sample performance compared to nonparametric approaches, particularly for small samples and well-understood parametric families. The key tradeoff is robustness: parametric bootstrap can fail badly under model misspecification.

The decision between parametric and nonparametric bootstrap should be guided by:

Prior knowledge about the data-generating process
Sample size and the stakes of the analysis
Diagnostic evidence for model fit
Sensitivity of conclusions to method choice

Chapter 4.4 Exercises: Parametric Bootstrap Mastery

A Note on These Exercises

These exercises develop your ability to implement, validate, and critically evaluate the parametric bootstrap:

Exercises 1–2 build core parametric bootstrap skills: SE estimation and the impact of distributional assumptions
Exercises 3–4 explore regression applications and the consequences of model misspecification
Exercise 5 compares confidence interval methods within the parametric bootstrap framework
Exercise 6 is a capstone integrating diagnostics, method selection, and inference into a complete workflow

Exercise 4.4.1: Parametric Bootstrap SE for the Exponential Rate

Background: The MLE for the exponential rate is \(\hat{\lambda} = 1/\bar{X}\). By the delta method, \(\text{SE}(\hat{\lambda}) \approx \hat{\lambda}/\sqrt{n}\), providing an analytical benchmark for the bootstrap.

Tasks:

Generate \(n = 80\) observations from \(\text{Exp}(\lambda = 2)\). Compute the MLE \(\hat{\lambda}\) and the parametric bootstrap SE using \(B = 5{,}000\) replicates. Compare to the delta method formula \(\hat{\lambda}/\sqrt{n}\).
Compute the nonparametric bootstrap SE for \(\hat{\lambda}\) using the same data. How do the parametric and nonparametric SEs compare? Which is larger and why?
Run a coverage study: repeat 2,000 times, each time generating fresh \(\text{Exp}(2)\) data, computing 95% percentile CIs from both parametric and nonparametric bootstrap (\(B = 2{,}000\) inner replicates), and checking whether the true \(\lambda = 2\) is covered. Compare the actual coverage to the nominal 95%.

Hint: Use rng = np.random.default_rng(42) for reproducibility. For part (c), the parametric bootstrap should achieve coverage very close to nominal because the model is correctly specified.

Exercise 4.4.2: Location-Scale Bootstrap with Heavy Tails

Background: When data come from a heavy-tailed distribution, fitting a Normal model underestimates tail variability. This exercise explores the consequences for bootstrap SE estimation.

Tasks:

Generate \(n = 60\) observations from a scaled and shifted \(t(5)\) distribution: \(X = 2Z + 10\) where \(Z \sim t_5\). Fit a Normal model (MLE for \(\mu, \sigma\)) and compute the parametric bootstrap SE for the sample mean using \(B = 5{,}000\) replicates.
Now fit a \(t\)-distribution using scipy.stats.t.fit(x) and compute the parametric bootstrap SE under the \(t\) model. Also compute the nonparametric bootstrap SE. Compare all three SEs and explain any differences.
Create Q-Q plots of the data against both the Normal and \(t\) fits. Which fit captures the tails better? Run the Shapiro-Wilk test and interpret it in light of the Q-Q plots.
Why does the Normal-based parametric bootstrap underestimate the SE for the mean when data are heavy-tailed? Connect your answer to the variance formula \(\text{Var}(X) = \sigma^2 \cdot \nu/(\nu - 2)\) for the \(t\)-distribution.

Hint: Use scipy.stats.t.fit(x) to estimate degrees of freedom, location, and scale simultaneously.

Exercise 4.4.3: Parametric Bootstrap for Linear Regression

Background: The parametric bootstrap for fixed-X regression generates \(y^* = X\hat{\beta} + \varepsilon^*\) where \(\varepsilon^* \sim N(0, \hat{\sigma}^2 I)\). The residual bootstrap instead resamples \(\varepsilon^*\) from the observed residuals. When errors are truly Normal, both should agree; when they are not, the parametric version may mislead.

Tasks:

Generate regression data with \(n = 100\), \(\beta = (2, 0.5)^T\), \(\sigma = 1.5\), and Normal errors: \(y_i = 2 + 0.5 x_i + \varepsilon_i\) with \(x_i \sim \text{Unif}(0, 10)\). Compute the parametric bootstrap SE, residual bootstrap SE, and analytical SE for \(\hat{\beta}_1\) using \(B = 3{,}000\) replicates. Verify all three agree.
Repeat with skewed errors: replace \(\varepsilon_i \sim N(0, 1.5^2)\) with \(\varepsilon_i \sim \text{Exp}(1.5) - 1.5\) (centered exponential). Compare the parametric and residual bootstrap SEs. Which is more trustworthy and why?
Run a Shapiro-Wilk test on the OLS residuals from both datasets. How does the diagnostic relate to the SE discrepancy (or lack thereof) you observed?
In two sentences, state the practical rule: when should a practitioner prefer the residual bootstrap over the parametric bootstrap for regression?

Solution

import numpy as np
from scipy import stats

rng = np.random.default_rng(314)
n, B = 100, 3000
x_reg = rng.uniform(0, 10, size=n)
X = np.column_stack([np.ones(n), x_reg])
beta_true = np.array([2.0, 0.5])

# (a) Normal errors
y = X @ beta_true + rng.normal(0, 1.5, n)
beta_hat = np.linalg.lstsq(X, y, rcond=None)[0]
resid = y - X @ beta_hat
sigma_hat = np.sqrt(np.sum(resid**2) / (n - 2))

# Parametric bootstrap
beta_para = np.array([np.linalg.lstsq(X, X @ beta_hat + rng.normal(0, sigma_hat, n),
                      rcond=None)[0] for _ in range(B)])
# Residual bootstrap
r_centered = resid - resid.mean()
beta_resid = np.array([np.linalg.lstsq(X, X @ beta_hat + rng.choice(r_centered, n, replace=True),
                       rcond=None)[0] for _ in range(B)])
# Analytical
XtX_inv = np.linalg.inv(X.T @ X)
se_analytical = sigma_hat * np.sqrt(XtX_inv[1, 1])

print(f"Normal errors — SE(beta_1):")
print(f"  Parametric: {np.std(beta_para[:,1], ddof=1):.4f}")
print(f"  Residual:   {np.std(beta_resid[:,1], ddof=1):.4f}")
print(f"  Analytical: {se_analytical:.4f}")

# (b) Skewed errors
rng2 = np.random.default_rng(314)
x2 = rng2.uniform(0, 10, n)
X2 = np.column_stack([np.ones(n), x2])
y2 = X2 @ beta_true + (rng2.exponential(1.5, n) - 1.5)
bh2 = np.linalg.lstsq(X2, y2, rcond=None)[0]
r2 = y2 - X2 @ bh2
sig2 = np.sqrt(np.sum(r2**2) / (n-2))

bp2 = np.array([np.linalg.lstsq(X2, X2 @ bh2 + rng.normal(0, sig2, n),
                 rcond=None)[0] for _ in range(B)])
r2c = r2 - r2.mean()
br2 = np.array([np.linalg.lstsq(X2, X2 @ bh2 + rng.choice(r2c, n, replace=True),
                 rcond=None)[0] for _ in range(B)])
print(f"\nSkewed errors — SE(beta_1):")
print(f"  Parametric: {np.std(bp2[:,1], ddof=1):.4f}")
print(f"  Residual:   {np.std(br2[:,1], ddof=1):.4f}")

# (c) Shapiro-Wilk
print(f"\nShapiro-Wilk on residuals:")
print(f"  Normal errors: p = {stats.shapiro(resid).pvalue:.4f}")
print(f"  Skewed errors: p = {stats.shapiro(r2).pvalue:.4f}")

Key insights:

With Normal errors, all three SEs agree closely, confirming both bootstrap methods are valid when the model is correct.
With exponential errors, the parametric bootstrap (which assumes Normality) gives a slightly different SE than the residual bootstrap. The residual bootstrap is more trustworthy because it preserves the actual error distribution.
The Shapiro-Wilk test passes for Normal errors (\(p \gg 0.05\)) and rejects for exponential errors (\(p \approx 0\)), correctly flagging the misspecification.
Practical rule: Use the residual bootstrap whenever the Shapiro-Wilk test rejects or Q-Q plots show non-Normality. The residual bootstrap requires only that errors are exchangeable (homoscedastic), not Normal.

Exercise 4.4.4: Model Misspecification and Bootstrap Failure

Background: When the assumed parametric model is wrong, the parametric bootstrap can produce confidently wrong inferences. This exercise demonstrates a dramatic failure using a mixture distribution.

Tasks:

Generate \(n = 100\) observations from the mixture \(0.3 \cdot N(-2, 1) + 0.7 \cdot N(3, 0.5)\). Fit a single Normal model and compute both parametric and nonparametric bootstrap SEs for the median using \(B = 5{,}000\) replicates. What is the ratio of the two SEs?
Run a coverage study (2,000 simulations, \(B = 2{,}000\) inner replicates): for each simulation, generate fresh mixture data, compute 95% percentile CIs from both parametric and nonparametric bootstrap, and check coverage of the true mixture median. Report the actual coverage for each method.
Explain why the parametric bootstrap fails so catastrophically. What feature of the true distribution does the single Normal miss that is critical for the median?
The parametric bootstrap SE in part (a) should be roughly twice the nonparametric SE. Connect this to the fact that the Normal model spreads probability mass symmetrically, while the bimodal mixture concentrates mass near two modes with most observations near \(x = 3\).

Hint: Compute the true mixture median by simulation: generate \(10^6\) mixture draws and take the sample median.

Exercise 4.4.5: Confidence Interval Methods

Background: The parametric bootstrap can be combined with different CI construction methods: percentile, basic (reverse percentile), and studentized (bootstrap-t). Each has different finite-sample properties, and the studentized interval achieves second-order accuracy.

Tasks:

Generate \(n = 50\) observations from \(\text{Exp}(\lambda = 3)\). Compute the MLE \(\hat{\lambda} = 1/\bar{X}\) and construct a percentile 95% CI from \(B = 5{,}000\) parametric bootstrap replicates. Report the interval and its width.
Construct the basic (reverse percentile) 95% CI from the same bootstrap replicates: \((2\hat{\lambda} - q_{0.975}^*, \; 2\hat{\lambda} - q_{0.025}^*)\). Compare to the percentile CI — which is shifted further right?
Construct the studentized (bootstrap-t) 95% CI. For each replicate, compute \(t^*_b = (\hat{\lambda}^*_b - \hat{\lambda}) / \text{SE}^*_b\) where \(\text{SE}^*_b = \hat{\lambda}^*_b / \sqrt{n}\). The CI is \((\hat{\lambda} - t^*_{0.975} \cdot \text{SE}, \; \hat{\lambda} - t^*_{0.025} \cdot \text{SE})\).
Run a coverage study (2,000 simulations, \(B = 2{,}000\)) comparing all three CI methods. Which achieves closest-to-nominal 95% coverage? Why does the studentized interval perform best for this skewed statistic?

Solution

import numpy as np

rng = np.random.default_rng(555)
n, B = 50, 5000
lam_true = 3.0
x = rng.exponential(scale=1/lam_true, size=n)
lam_hat = 1 / np.mean(x)
se_hat = lam_hat / np.sqrt(n)

# Bootstrap replicates
lam_boot = np.empty(B)
se_boot = np.empty(B)
for b in range(B):
    x_star = rng.exponential(scale=1/lam_hat, size=n)
    lam_boot[b] = 1 / np.mean(x_star)
    se_boot[b] = lam_boot[b] / np.sqrt(n)

# (a) Percentile
ci_perc = np.percentile(lam_boot, [2.5, 97.5])
# (b) Basic
ci_basic = (2*lam_hat - np.percentile(lam_boot, 97.5),
            2*lam_hat - np.percentile(lam_boot, 2.5))
# (c) Studentized
t_boot = (lam_boot - lam_hat) / se_boot
tl, tu = np.percentile(t_boot, [2.5, 97.5])
ci_stud = (lam_hat - tu * se_hat, lam_hat - tl * se_hat)

for name, ci in [("Percentile", ci_perc), ("Basic", ci_basic),
                  ("Studentized", ci_stud)]:
    print(f"{name:12s}: ({ci[0]:.4f}, {ci[1]:.4f}), "
          f"width={ci[1]-ci[0]:.4f}")

# (d) Coverage study
n_sim, B_inner = 2000, 2000
covers = {'percentile': 0, 'basic': 0, 'studentized': 0}
for _ in range(n_sim):
    xs = rng.exponential(1/lam_true, n)
    ls = 1/np.mean(xs)
    se_s = ls/np.sqrt(n)
    lb = np.empty(B_inner)
    sb = np.empty(B_inner)
    for b in range(B_inner):
        xb = rng.exponential(1/ls, n)
        lb[b] = 1/np.mean(xb)
        sb[b] = lb[b]/np.sqrt(n)
    ci = np.percentile(lb, [2.5, 97.5])
    covers['percentile'] += ci[0] <= lam_true <= ci[1]
    ci = (2*ls - np.percentile(lb, 97.5), 2*ls - np.percentile(lb, 2.5))
    covers['basic'] += ci[0] <= lam_true <= ci[1]
    tb = (lb - ls)/sb
    tl, tu = np.percentile(tb, [2.5, 97.5])
    ci = (ls - tu*se_s, ls - tl*se_s)
    covers['studentized'] += ci[0] <= lam_true <= ci[1]

for method, count in covers.items():
    print(f"{method:12s} coverage: {count/n_sim:.4f}")

Key insights:

The studentized CI achieves closest-to-nominal coverage (\(\approx 0.950\)) because it captures the asymmetry of the sampling distribution of \(\hat{\lambda}\). The \(t^*\) distribution is right-skewed, and the studentized method automatically adjusts the critical values.
The percentile CI has slight undercoverage for skewed statistics because it does not correct for bias or skewness.
The basic CI performs worst because it reflects the bootstrap distribution around \(\hat{\lambda}\), which amplifies the skewness in the wrong direction.
The studentized interval has second-order accuracy \(O(n^{-1})\) compared to the \(O(n^{-1/2})\) error of percentile and basic methods—exactly the theoretical advantage derived in Section 4.4.

Exercise 4.4.6: Complete Parametric Bootstrap Workflow

Background: A quality engineer measures \(n = 75\) component lifetimes (in hours) that are positive and right-skewed. The goal is to estimate the mean lifetime and the probability of exceeding 10 hours, with uncertainty quantification.

Tasks:

Generate \(n = 75\) observations from \(\text{Gamma}(\text{shape} = 3, \text{scale} = 2)\). Fit a Gamma distribution using scipy.stats.gamma.fit(x, floc=0) and report the MLE shape and scale parameters.
Assess the Gamma fit with two diagnostics: (i) a Q-Q plot of the data against the fitted Gamma, and (ii) a histogram with the fitted Gamma density overlaid. Run a Kolmogorov-Smirnov test with scipy.stats.kstest. Is the Gamma model adequate?
Compute parametric bootstrap estimates (\(B = 5{,}000\)) for two quantities: the mean \(\bar{X}\) and the tail probability \(\hat{p} = \hat{P}(X > 10)\). Report point estimates and bootstrap SEs for both.
Compute nonparametric bootstrap SEs for the same two quantities. For which statistic do the parametric and nonparametric SEs differ more? Explain why in terms of the parametric model’s influence on tail estimation.
Construct 95% percentile CIs for both quantities using both methods. Which CIs are narrower? Write a one-paragraph summary stating your point estimates, CIs, and which bootstrap method you recommend for this dataset.

Hint: A tail probability like \(P(X > 10)\) depends heavily on the fitted model’s right tail — this is where parametric assumptions have the most leverage.

Solution

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

rng = np.random.default_rng(2026)
n, B = 75, 5000
x = rng.gamma(3.0, 2.0, size=n)

# (a) Fit Gamma
shape_hat, _, scale_hat = stats.gamma.fit(x, floc=0)
print(f"Gamma MLE: shape={shape_hat:.3f}, scale={scale_hat:.3f}")

# (b) Diagnostics
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
stats.probplot(x, dist=stats.gamma, sparams=(shape_hat, 0, scale_hat),
               plot=axes[0])
axes[0].set_title("Q-Q: Gamma fit")
axes[1].hist(x, bins=15, density=True, alpha=0.7, edgecolor='black')
xg = np.linspace(0, x.max()*1.1, 200)
axes[1].plot(xg, stats.gamma.pdf(xg, shape_hat, 0, scale_hat), 'r-', lw=2)
axes[1].set_title("Histogram with Gamma overlay")
plt.tight_layout()
plt.show()

ks = stats.kstest(x, 'gamma', args=(shape_hat, 0, scale_hat))
print(f"KS test: D={ks.statistic:.4f}, p={ks.pvalue:.4f}")

# (c) Parametric bootstrap
mean_p = np.empty(B)
prob_p = np.empty(B)
for b in range(B):
    xs = rng.gamma(shape_hat, scale_hat, n)
    mean_p[b] = np.mean(xs)
    prob_p[b] = np.mean(xs > 10)

print(f"Mean: {np.mean(x):.3f} (SE={np.std(mean_p, ddof=1):.3f})")
print(f"P(X>10): {np.mean(x>10):.3f} (SE={np.std(prob_p, ddof=1):.3f})")

# (d) Nonparametric bootstrap
mean_np = np.array([np.mean(rng.choice(x, n, replace=True)) for _ in range(B)])
prob_np = np.array([np.mean(rng.choice(x, n, replace=True) > 10)
                    for _ in range(B)])

print(f"Parametric vs nonparametric SEs:")
print(f"  Mean: {np.std(mean_p, ddof=1):.3f} vs {np.std(mean_np, ddof=1):.3f}")
print(f"  P(X>10): {np.std(prob_p, ddof=1):.3f} vs {np.std(prob_np, ddof=1):.3f}")

# (e) 95% percentile CIs
for name, bp, bnp in [("Mean", mean_p, mean_np),
                        ("P(X>10)", prob_p, prob_np)]:
    ci_p = np.percentile(bp, [2.5, 97.5])
    ci_np = np.percentile(bnp, [2.5, 97.5])
    print(f"{name} — Para CI: ({ci_p[0]:.3f}, {ci_p[1]:.3f}), "
          f"Nonpara CI: ({ci_np[0]:.3f}, {ci_np[1]:.3f})")

Key insights:

The KS test passes (\(p \approx 0.96\)) and the Q-Q plot shows a good fit, confirming the Gamma model is adequate.
For the mean, parametric and nonparametric SEs are nearly identical — the mean is a smooth functional that is insensitive to model choice.
For \(P(X > 10)\), the parametric SE can be slightly smaller because the Gamma model smoothly interpolates the tail rather than relying on the sparse empirical tail. This is where parametric assumptions provide the most leverage.
The parametric CI for the mean is slightly narrower, and the two tail-probability CIs have essentially the same width — consistent with the modest efficiency gain from a correctly specified model. Since the diagnostics confirm the Gamma model fits well, the parametric bootstrap is recommended for this dataset.

Key Takeaways 📝

Core concept: Parametric bootstrap simulates from a fitted parametric model \(F_{\hat{\theta}}\) rather than resampling from the empirical distribution \(\hat{F}_n\).
Efficiency-robustness tradeoff: Parametric bootstrap is more efficient when the model is correct, but fails under misspecification; nonparametric bootstrap is robust but less efficient.
When to use parametric: Small samples with credible models, well-established parametric families, when efficiency matters, or when nonparametric methods encounter boundary issues.
Critical requirement: Always validate model assumptions through diagnostic plots and, when possible, compare to nonparametric results.
Outcome alignment: This section addresses LO 1 (simulation techniques) and LO 3 (resampling for CIs and variability assessment).