Chapter 4: Resampling Methods

Resampling methods represent one of the most profound shifts in statistical practice since the advent of electronic computing. Rather than relying on closed-form sampling distributions derived under idealized assumptions, resampling techniques use the observed data itself to approximate the variability of statistical estimators. The bootstrap, introduced by Bradley Efron in 1979, embodies a deceptively simple idea: treat the sample as a proxy for the population, resample from it repeatedly, and let the empirical distribution of recomputed statistics stand in for the unknown theoretical sampling distribution.

This chapter develops the complete theory and practice of resampling methods. We begin with the fundamental problem that motivates these techniques: the sampling distribution of a statistic is rarely available in closed form, and asymptotic approximations may be inadequate for finite samples or complex statistics. The plug-in principle provides the conceptual foundation—estimate the population distribution \(F\) with the empirical distribution \(\hat{F}_n\), then propagate this estimate through the statistic of interest. The nonparametric bootstrap operationalizes this idea via Monte Carlo simulation.

The parametric bootstrap leverages model assumptions to improve efficiency when those assumptions hold, while the jackknife—the bootstrap’s deterministic precursor—provides complementary tools for variance and bias estimation essential for advanced interval methods. For hypothesis testing, we develop permutation tests that achieve exact inference under exchangeability, and bootstrap tests that properly enforce the null hypothesis. We then examine multiple confidence interval constructions—percentile, basic, studentized, and the bias-corrected and accelerated (BCa) method—each with different theoretical properties and practical trade-offs. Finally, cross-validation uses resampling for prediction assessment and model selection, bridging to machine learning applications.

Learning Objectives: Upon completing this chapter, you will be able to:

Foundations

Define the sampling distribution problem and explain why closed-form solutions are rarely available
Derive the plug-in principle from the empirical distribution function
Prove the Glivenko-Cantelli uniform convergence theorem
Distinguish statistical uncertainty (finite sample) from Monte Carlo uncertainty (finite resamples)

Nonparametric Bootstrap

Implement the nonparametric bootstrap algorithm with proper seed management
Compute bootstrap standard errors and bias estimates with appropriate Monte Carlo sample sizes
Analyze when bootstrap succeeds and when it fails (non-smooth statistics, boundaries, small samples)

Parametric Bootstrap

Implement parametric bootstrap when model assumptions are justified
Evaluate efficiency gains under correct specification and risks under misspecification
Select among pairs, residual, and wild bootstrap based on design and heteroskedasticity

Jackknife Methods

Apply leave-one-out and delete-d jackknife for variance and bias estimation
Compute pseudo-values and connect them to influence functions
Compare jackknife and bootstrap for smooth versus non-smooth statistics
Identify when jackknife fails and appropriate alternatives

Hypothesis Testing

Construct permutation tests and prove their exactness under exchangeability
Implement bootstrap tests by resampling under the null hypothesis
Distinguish when permutation versus bootstrap tests are appropriate

Bootstrap Confidence Intervals

Construct percentile, basic, studentized, BC, and BCa intervals
Derive the BCa adjustment formula and compute bias-correction and acceleration parameters
Evaluate coverage accuracy of different interval methods using Edgeworth expansion results

Cross-Validation

Implement leave-one-out and K-fold cross-validation for prediction assessment
Design nested cross-validation to avoid optimistic bias in model selection
Apply the .632 and .632+ bootstrap estimators for prediction error
Connect cross-validation to information criteria (AIC, BIC)

Diagnostics and Practice

Diagnose bootstrap distribution pathologies including multimodality and boundary effects
Select the number of bootstrap replicates based on target precision and computational budget
Identify bootstrap failure modes and apply appropriate remedies
Report resampling results with appropriate uncertainty quantification

Sections