.. _chapter4:

================================
Chapter 4: Resampling Methods
================================

.. contents:: Chapter Contents
   :local:
   :depth: 2

Resampling methods represent one of the most profound shifts in statistical practice since the advent of electronic computing. Rather than relying on closed-form sampling distributions derived under idealized assumptions, resampling techniques use the observed data itself to approximate the variability of statistical estimators. The bootstrap, introduced by Bradley Efron in 1979, embodies a deceptively simple idea: treat the sample as a proxy for the population, resample from it repeatedly, and let the empirical distribution of recomputed statistics stand in for the unknown theoretical sampling distribution.

This chapter develops the complete theory and practice of resampling methods. We begin with the fundamental problem that motivates these techniques: the sampling distribution of a statistic is rarely available in closed form, and asymptotic approximations may be inadequate for finite samples or complex statistics. The **plug-in principle** provides the conceptual foundation—estimate the population distribution :math:`F` with the empirical distribution :math:`\hat{F}_n`, then propagate this estimate through the statistic of interest. The **nonparametric bootstrap** operationalizes this idea via Monte Carlo simulation.

The **parametric bootstrap** leverages model assumptions to improve efficiency when those assumptions hold, while the **jackknife**—the bootstrap's deterministic precursor—provides complementary tools for variance and bias estimation essential for advanced interval methods. For **hypothesis testing**, we develop permutation tests that achieve exact inference under exchangeability, and bootstrap tests that properly enforce the null hypothesis. We then examine multiple **confidence interval constructions**—percentile, basic, studentized, and the bias-corrected and accelerated (BCa) method—each with different theoretical properties and practical trade-offs. Finally, **cross-validation** uses resampling for prediction assessment and model selection, bridging to machine learning applications.

**Learning Objectives:** Upon completing this chapter, you will be able to:

**Foundations**

* **Define** the sampling distribution problem and explain why closed-form solutions are rarely available
* **Derive** the plug-in principle from the empirical distribution function
* **Prove** the Glivenko-Cantelli uniform convergence theorem
* **Distinguish** statistical uncertainty (finite sample) from Monte Carlo uncertainty (finite resamples)

**Nonparametric Bootstrap**

* **Implement** the nonparametric bootstrap algorithm with proper seed management
* **Compute** bootstrap standard errors and bias estimates with appropriate Monte Carlo sample sizes
* **Analyze** when bootstrap succeeds and when it fails (non-smooth statistics, boundaries, small samples)

**Parametric Bootstrap**

* **Implement** parametric bootstrap when model assumptions are justified
* **Evaluate** efficiency gains under correct specification and risks under misspecification
* **Select** among pairs, residual, and wild bootstrap based on design and heteroskedasticity

**Jackknife Methods**

* **Apply** leave-one-out and delete-d jackknife for variance and bias estimation
* **Compute** pseudo-values and connect them to influence functions
* **Compare** jackknife and bootstrap for smooth versus non-smooth statistics
* **Identify** when jackknife fails and appropriate alternatives

**Hypothesis Testing**

* **Construct** permutation tests and prove their exactness under exchangeability
* **Implement** bootstrap tests by resampling under the null hypothesis
* **Distinguish** when permutation versus bootstrap tests are appropriate

**Bootstrap Confidence Intervals**

* **Construct** percentile, basic, studentized, BC, and BCa intervals
* **Derive** the BCa adjustment formula and compute bias-correction and acceleration parameters
* **Evaluate** coverage accuracy of different interval methods using Edgeworth expansion results

**Cross-Validation**

* **Implement** leave-one-out and K-fold cross-validation for prediction assessment
* **Design** nested cross-validation to avoid optimistic bias in model selection
* **Apply** the .632 and .632+ bootstrap estimators for prediction error
* **Connect** cross-validation to information criteria (AIC, BIC)

**Diagnostics and Practice**

* **Diagnose** bootstrap distribution pathologies including multimodality and boundary effects
* **Select** the number of bootstrap replicates based on target precision and computational budget
* **Identify** bootstrap failure modes and apply appropriate remedies
* **Report** resampling results with appropriate uncertainty quantification

.. toctree::
   :maxdepth: 2
   :caption: Sections

   ch4_1-sampling-distribution-problem
   ch4_2-empirical-distribution-plugin
   ch4_3-nonparametric-bootstrap
   ch4_4-parametric-bootstrap
   ch4_5-jackknife-methods
   ch4_6-bootstrap-hypothesis-testing
   ch4_7-bootstrap-confidence-intervals
   ch4_8-cross-validation
   ch4_9-chapter-summary