Chapter 4: Resampling Methods

Resampling methods represent one of the most profound shifts in statistical practice since the advent of electronic computing. Rather than relying on closed-form sampling distributions derived under idealized assumptions, resampling techniques use the observed data itself to approximate the variability of statistical estimators. The bootstrap, introduced by Bradley Efron in 1979, embodies a deceptively simple idea: treat the sample as a proxy for the population, resample from it repeatedly, and let the empirical distribution of recomputed statistics stand in for the unknown theoretical sampling distribution.

This chapter develops the complete theory and practice of resampling methods. We begin with the fundamental problem that motivates these techniques: the sampling distribution of a statistic is rarely available in closed form, and asymptotic approximations may be inadequate for finite samples or complex statistics. The plug-in principle provides the conceptual foundation—estimate the population distribution \(F\) with the empirical distribution \(\hat{F}_n\), then propagate this estimate through the statistic of interest. The nonparametric bootstrap operationalizes this idea via Monte Carlo simulation.

The parametric bootstrap leverages model assumptions to improve efficiency when those assumptions hold, while the jackknife—the bootstrap’s deterministic precursor—provides complementary tools for variance and bias estimation essential for advanced interval methods. For hypothesis testing, we develop permutation tests that achieve exact inference under exchangeability, and bootstrap tests that properly enforce the null hypothesis. We then examine multiple confidence interval constructions—percentile, basic, studentized, and the bias-corrected and accelerated (BCa) method—each with different theoretical properties and practical trade-offs. Finally, cross-validation uses resampling for prediction assessment and model selection, bridging to machine learning applications.

Learning Objectives: Upon completing this chapter, you will be able to:

Foundations

  • Define the sampling distribution problem and explain why closed-form solutions are rarely available

  • Derive the plug-in principle from the empirical distribution function

  • Prove the Glivenko-Cantelli uniform convergence theorem

  • Distinguish statistical uncertainty (finite sample) from Monte Carlo uncertainty (finite resamples)

Nonparametric Bootstrap

  • Implement the nonparametric bootstrap algorithm with proper seed management

  • Compute bootstrap standard errors and bias estimates with appropriate Monte Carlo sample sizes

  • Analyze when bootstrap succeeds and when it fails (non-smooth statistics, boundaries, small samples)

Parametric Bootstrap

  • Implement parametric bootstrap when model assumptions are justified

  • Evaluate efficiency gains under correct specification and risks under misspecification

  • Select among pairs, residual, and wild bootstrap based on design and heteroskedasticity

Jackknife Methods

  • Apply leave-one-out and delete-d jackknife for variance and bias estimation

  • Compute pseudo-values and connect them to influence functions

  • Compare jackknife and bootstrap for smooth versus non-smooth statistics

  • Identify when jackknife fails and appropriate alternatives

Hypothesis Testing

  • Construct permutation tests and prove their exactness under exchangeability

  • Implement bootstrap tests by resampling under the null hypothesis

  • Distinguish when permutation versus bootstrap tests are appropriate

Bootstrap Confidence Intervals

  • Construct percentile, basic, studentized, BC, and BCa intervals

  • Derive the BCa adjustment formula and compute bias-correction and acceleration parameters

  • Evaluate coverage accuracy of different interval methods using Edgeworth expansion results

Cross-Validation

  • Implement leave-one-out and K-fold cross-validation for prediction assessment

  • Design nested cross-validation to avoid optimistic bias in model selection

  • Apply the .632 and .632+ bootstrap estimators for prediction error

  • Connect cross-validation to information criteria (AIC, BIC)

Diagnostics and Practice

  • Diagnose bootstrap distribution pathologies including multimodality and boundary effects

  • Select the number of bootstrap replicates based on target precision and computational budget

  • Identify bootstrap failure modes and apply appropriate remedies

  • Report resampling results with appropriate uncertainty quantification

Sections