Chapter 4: Resampling Methods
Resampling methods represent one of the most profound shifts in statistical practice since the advent of electronic computing. Rather than relying on closed-form sampling distributions derived under idealized assumptions, resampling techniques use the observed data itself to approximate the variability of statistical estimators. The bootstrap, introduced by Bradley Efron in 1979, embodies a deceptively simple idea: treat the sample as a proxy for the population, resample from it repeatedly, and let the empirical distribution of recomputed statistics stand in for the unknown theoretical sampling distribution.
This chapter develops the complete theory and practice of resampling methods. We begin with the fundamental problem that motivates these techniques: the sampling distribution of a statistic is rarely available in closed form, and asymptotic approximations may be inadequate for finite samples or complex statistics. The plug-in principle provides the conceptual foundation—estimate the population distribution \(F\) with the empirical distribution \(\hat{F}_n\), then propagate this estimate through the statistic of interest. The nonparametric bootstrap operationalizes this idea via Monte Carlo simulation.
The parametric bootstrap leverages model assumptions to improve efficiency when those assumptions hold, while the jackknife—the bootstrap’s deterministic precursor—provides complementary tools for variance and bias estimation essential for advanced interval methods. For hypothesis testing, we develop permutation tests that achieve exact inference under exchangeability, and bootstrap tests that properly enforce the null hypothesis. We then examine multiple confidence interval constructions—percentile, basic, studentized, and the bias-corrected and accelerated (BCa) method—each with different theoretical properties and practical trade-offs. Finally, cross-validation uses resampling for prediction assessment and model selection, bridging to machine learning applications.
Learning Objectives: Upon completing this chapter, you will be able to:
Foundations
Define the sampling distribution problem and explain why closed-form solutions are rarely available
Derive the plug-in principle from the empirical distribution function
Prove the Glivenko-Cantelli uniform convergence theorem
Distinguish statistical uncertainty (finite sample) from Monte Carlo uncertainty (finite resamples)
Nonparametric Bootstrap
Implement the nonparametric bootstrap algorithm with proper seed management
Compute bootstrap standard errors and bias estimates with appropriate Monte Carlo sample sizes
Analyze when bootstrap succeeds and when it fails (non-smooth statistics, boundaries, small samples)
Parametric Bootstrap
Implement parametric bootstrap when model assumptions are justified
Evaluate efficiency gains under correct specification and risks under misspecification
Select among pairs, residual, and wild bootstrap based on design and heteroskedasticity
Jackknife Methods
Apply leave-one-out and delete-d jackknife for variance and bias estimation
Compute pseudo-values and connect them to influence functions
Compare jackknife and bootstrap for smooth versus non-smooth statistics
Identify when jackknife fails and appropriate alternatives
Hypothesis Testing
Construct permutation tests and prove their exactness under exchangeability
Implement bootstrap tests by resampling under the null hypothesis
Distinguish when permutation versus bootstrap tests are appropriate
Bootstrap Confidence Intervals
Construct percentile, basic, studentized, BC, and BCa intervals
Derive the BCa adjustment formula and compute bias-correction and acceleration parameters
Evaluate coverage accuracy of different interval methods using Edgeworth expansion results
Cross-Validation
Implement leave-one-out and K-fold cross-validation for prediction assessment
Design nested cross-validation to avoid optimistic bias in model selection
Apply the .632 and .632+ bootstrap estimators for prediction error
Connect cross-validation to information criteria (AIC, BIC)
Diagnostics and Practice
Diagnose bootstrap distribution pathologies including multimodality and boundary effects
Select the number of bootstrap replicates based on target precision and computational budget
Identify bootstrap failure modes and apply appropriate remedies
Report resampling results with appropriate uncertainty quantification
Sections
- Section 4.1 The Sampling Distribution Problem
- The Fundamental Target: Sampling Distributions
- Historical Development: The Quest for Sampling Distributions
- Three Routes to the Sampling Distribution
- When Asymptotics Fail: Motivating the Bootstrap
- The Plug-In Principle: Theoretical Foundation
- Computational Perspective: Bootstrap as Monte Carlo
- Practical Considerations
- Bringing It All Together
- Chapter 4.1 Exercises
- References
- Section 4.2 The Empirical Distribution and Plug-in Principle
- The Empirical Cumulative Distribution Function
- Convergence of the Empirical CDF
- Parameters as Statistical Functionals
- The Plug-in Principle
- When the Plug-in Principle Fails
- The Bootstrap Idea in One Sentence
- Computational Implementation
- Bringing It All Together
- Section 4.2 Exercises: ECDF and Plug-in Mastery
- References
- Section 4.3 The Nonparametric Bootstrap
- Section 4.4: The Parametric Bootstrap
- Section 4.5: Jackknife Methods
- Section 4.6 Bootstrap Hypothesis Testing and Permutation Tests
- From Confidence Intervals to Hypothesis Tests
- The Bootstrap Hypothesis Testing Framework
- Permutation Tests: Exact Tests Under Exchangeability
- Testing Equality of Distributions
- Bootstrap Tests for Regression
- Bootstrap vs Classical Tests
- Permutation vs Bootstrap: Choosing the Right Approach
- Multiple Testing with Bootstrap
- Practical Considerations
- Bringing It All Together
- Exercises
- References