Section 4.5: Jackknife Methods
The bootstrap, introduced by Efron in 1979, revolutionized computational statistics by providing a general-purpose tool for uncertainty quantification. But the bootstrap had an important predecessor: the jackknife, developed by Quenouille (1949) for bias reduction and extended by Tukey (1958) for variance estimation. While the bootstrap has largely superseded the jackknife for interval construction, the jackknife remains valuable in its own right—it is deterministic, computationally efficient for smooth statistics, and provides unique diagnostic capabilities through its connection to influence functions.
This section develops the jackknife as both a historical precursor to the bootstrap and a modern tool with distinct strengths. We begin with the delete-1 jackknife, extend to the delete-\(d\) generalization, explore bias correction, and conclude with a systematic comparison to bootstrap methods. Throughout, we emphasize when each method excels and how they can be used together.
Road Map 🧭
Understand: The jackknife principle as systematic leave-one-out perturbation analysis
Develop: Computational skills for jackknife standard errors, bias estimation, and diagnostics
Implement: Delete-1 and delete-\(d\) jackknife algorithms in Python
Evaluate: When to prefer jackknife over bootstrap, and when to use both
Historical Context and Motivation
The jackknife emerged from a practical problem: how to assess the reliability of a complex statistic without deriving its theoretical variance. Maurice Quenouille introduced the method in 1949 for bias reduction in serial correlation estimation, and John Tukey named it the “jackknife” in 1958, likening it to a versatile pocket knife that can help in many situations.
The Historical Problem
Consider estimating a parameter \(\theta = T(F)\) with a statistic \(\hat{\theta} = T(\hat{F}_n)\). For many statistics beyond the sample mean, deriving \(\text{Var}(\hat{\theta})\) analytically requires either:
Strong distributional assumptions (e.g., normality)
Taylor expansions (delta method) that may be complex
Asymptotic approximations that may be inaccurate for small \(n\)
The jackknife provides an empirical alternative: by systematically observing how \(\hat{\theta}\) changes when each observation is removed, we can approximate both the variance and bias of \(\hat{\theta}\) without any distributional assumptions.
The Core Insight
For a smooth statistic \(T\), removing a single observation \(X_i\) from the sample approximates a small perturbation to the empirical distribution \(\hat{F}_n\). If \(T\) is sufficiently regular (differentiable in an appropriate sense), the change \(\hat{\theta}_{(-i)} - \hat{\theta}\) reflects the influence of observation \(X_i\) on the estimate. Aggregating these influences across all observations recovers information about the sampling variability of \(\hat{\theta}\).
This is precisely the intuition behind the influence function from robust statistics, and we will see that the jackknife provides a discrete approximation to influence-function-based variance estimation.
The Delete-1 Jackknife
The delete-1 (or leave-one-out) jackknife is the foundation of jackknife methodology. We systematically compute the statistic on each of the \(n\) subsamples formed by removing one observation.
Notation and Basic Definitions
Let \(X_1, \ldots, X_n\) be an iid sample from distribution \(F\), and let \(\hat{\theta} = T(X_1, \ldots, X_n)\) be a statistic estimating \(\theta = T(F)\).
Leave-one-out estimate: For each \(i = 1, \ldots, n\), define
This is the statistic computed on the sample with observation \(X_i\) removed.
Jackknife average: The mean of the leave-one-out estimates is
Pseudovalues: Tukey introduced the pseudovalues
These act like \(n\) “replicates” of the original estimate. Their sample mean equals the bias-corrected jackknife estimate:
This equals \(\hat{\theta}\) only when \(\bar{\theta}_{(\cdot)} = \hat{\theta}\) (i.e., when the statistic is approximately unbiased or linear in the observations). For biased statistics, the mean of pseudovalues provides a bias-corrected estimate.
Intuition for Pseudovalues
Think of each pseudovalue \(\widetilde{\theta}_i\) as answering: “What would \(\hat{\theta}\) have been if observation \(X_i\) had \(n\) times its actual weight?” The pseudovalues inflate the influence of each observation, making the variability due to individual observations explicit.
Jackknife Standard Error
The jackknife standard error is computed from the variability of the leave-one-out estimates:
The factor \((n-1)/n\) (rather than \(1/n\)) accounts for the fact that leave-one-out estimates are computed on samples of size \(n-1\), not \(n\). This scaling ensures correct variance estimation for smooth statistics.
Equivalently, the jackknife SE can be expressed using pseudovalues:
where \(s_{\widetilde{\theta}}\) is the sample standard deviation of the pseudovalues \(\widetilde{\theta}_1, \ldots, \widetilde{\theta}_n\) (using divisor \(n-1\)).
For vector statistics \(\hat{\boldsymbol{\theta}} \in \mathbb{R}^p\), the jackknife covariance matrix is:
Marginal standard errors are the square roots of diagonal elements, and the standard error of a linear combination \(\mathbf{a}^\top\hat{\boldsymbol{\theta}}\) is \(\sqrt{\mathbf{a}^\top \widehat{\text{Cov}}_{\text{jack}} \, \mathbf{a}}\).
Why the Jackknife Works
For statistics that are smooth functionals of \(F\), the jackknife SE is consistent. The key theoretical result connects the jackknife to the influence function.
Influence Function: For a functional \(T\) and distribution \(F\), the influence function is
where \(\delta_x\) is a point mass at \(x\). This measures the rate of change of \(T\) when we contaminate \(F\) with a small mass at \(x\).
Connection to Jackknife: For smooth statistics, the jackknife residuals approximate the empirical influence function. Removing observation \(i\) changes the empirical distribution from \(\hat{F}_n\) (uniform weights \(1/n\)) to \(\hat{F}_{n,-i}\) (uniform weights \(1/(n-1)\) on remaining observations). The scaled jackknife difference
approximates the empirical influence function evaluated at each observation. The factor \((n-1)\) rather than \(n\) arises because the leave-one-out empirical distribution has weights \(1/(n-1)\).
Asymptotic Variance: Under regularity conditions, the asymptotic variance of \(\sqrt{n}(\hat{\theta} - \theta)\) is
The jackknife variance estimator consistently estimates this quantity:
This explains why the jackknife works for smooth statistics—it approximates the influence-function variance formula.
Python Implementation
import numpy as np
def jackknife_se(data, statistic):
"""
Compute the jackknife standard error of a statistic.
Parameters
----------
data : array_like
Original data (1D array or 2D with observations along axis 0).
statistic : callable
Function that computes the statistic from data. May return
a scalar or array.
Returns
-------
se_jack : float or ndarray
Jackknife standard error (same shape as statistic output).
theta_hat : float or ndarray
Original estimate from full data.
theta_i : ndarray
Leave-one-out estimates, shape (n,) or (n, p) for vector statistics.
"""
data = np.asarray(data)
n = len(data)
# Compute original estimate and determine shape
theta_hat = np.asarray(statistic(data))
# Allocate array for leave-one-out estimates (handles scalar or vector)
theta_i = np.empty((n,) + theta_hat.shape, dtype=float)
for i in range(n):
# Remove observation i
if data.ndim == 1:
data_minus_i = np.delete(data, i)
else:
data_minus_i = np.delete(data, i, axis=0)
theta_i[i] = statistic(data_minus_i)
# Jackknife SE formula (works for scalar or vector)
theta_bar = theta_i.mean(axis=0)
se_jack = np.sqrt(((n - 1) / n) * np.sum((theta_i - theta_bar)**2, axis=0))
return se_jack, theta_hat, theta_i
def jackknife_pseudovalues(theta_hat, theta_i):
"""
Compute jackknife pseudovalues.
Parameters
----------
theta_hat : float
Original estimate from full data.
theta_i : array_like
Leave-one-out estimates.
Returns
-------
pv : ndarray
Pseudovalues.
"""
theta_i = np.asarray(theta_i)
n = len(theta_i)
pv = n * theta_hat - (n - 1) * theta_i
return pv
Example 💡 Jackknife SE for the Sample Mean
Verification: For the sample mean, the jackknife SE should equal the classical formula \(s/\sqrt{n}\).
import numpy as np
# Generate data
rng = np.random.default_rng(42)
n = 50
data = rng.normal(10, 2, size=n)
# Define statistic
def sample_mean(x):
return x.mean()
# Jackknife SE
se_jack, theta_hat, theta_i = jackknife_se(data, sample_mean)
# Classical SE
se_classical = data.std(ddof=1) / np.sqrt(n)
print(f"Sample mean: {theta_hat:.4f}")
print(f"Jackknife SE: {se_jack:.4f}")
print(f"Classical SE: {se_classical:.4f}")
print(f"Ratio: {se_jack / se_classical:.6f}")
Output:
Sample mean: 10.0641
Jackknife SE: 0.2746
Classical SE: 0.2746
Ratio: 1.000000
The jackknife SE exactly equals the classical SE for the sample mean—this is not a coincidence but a mathematical identity (see Exercise 4.5.1).
Fig. 159 The Delete-1 Jackknife Algorithm. Panel (a) shows the original sample and the six leave-one-out samples, with removed observations marked by X. Panel (b) displays the leave-one-out estimates \(\hat{\theta}_{(-i)}\) as horizontal bars, with the original estimate \(\hat{\theta}\) and jackknife average \(\bar{\theta}_{(\cdot)}\) marked. Panel (c) shows the corresponding pseudovalues \(\widetilde{\theta}_i = n\hat{\theta} - (n-1)\hat{\theta}_{(-i)}\). Panel (d) summarizes the jackknife SE formula. Panel (e) confirms the exact equality between jackknife SE and classical SE for the sample mean. Panel (f) summarizes key properties and use cases.
Jackknife Bias Estimation
Quenouille’s original motivation for the jackknife was bias reduction. Many statistics have bias of order \(O(1/n)\), and the jackknife can estimate and correct this bias.
The Bias Formula
The jackknife bias estimate is:
This formula arises from the following reasoning: if \(\hat{\theta}\) has bias \(b/n + O(1/n^2)\) for sample size \(n\), then \(\hat{\theta}_{(-i)}\) (computed on \(n-1\) observations) has bias approximately \(b/(n-1)\). The difference in biases is \(b/(n-1) - b/n = b/(n(n-1))\). Summing over all \(n\) leave-one-out estimates and solving for \(b\) yields the jackknife bias formula.
Bias-Corrected Estimator
The bias-corrected jackknife estimator is:
This equals the mean of the pseudovalues:
Example: Variance Estimator
The biased variance estimator \(\hat{\sigma}^2 = \frac{1}{n}\sum_i (X_i - \bar{X})^2\) has bias \(-\sigma^2/n\). The jackknife bias-corrected version recovers the unbiased estimator \(s^2 = \frac{1}{n-1}\sum_i (X_i - \bar{X})^2\).
When to Use Bias Correction
Use when \(|\widehat{\text{Bias}}_{\text{jack}}|\) is large relative to \(\widehat{\text{SE}}_{\text{jack}}\) (say, > 25% of SE)
Be cautious with small \(n\)—bias correction increases variance
Report both \(\hat{\theta}\) and \(\hat{\theta}^{\text{jack-bc}}\) when the correction is material
Check consistency: if bootstrap bias agrees with jackknife bias, correction is more trustworthy
Python Implementation
def jackknife_bias(data, statistic):
"""
Compute jackknife bias estimate and bias-corrected estimator.
Parameters
----------
data : array_like
Original data.
statistic : callable
Function that computes the statistic.
Returns
-------
bias_jack : float
Jackknife bias estimate.
theta_bc : float
Bias-corrected estimate.
theta_hat : float
Original estimate.
"""
data = np.asarray(data)
n = len(data)
theta_hat = statistic(data)
# Leave-one-out estimates
theta_i = np.empty(n)
for i in range(n):
if data.ndim == 1:
data_minus_i = np.delete(data, i)
else:
data_minus_i = np.delete(data, i, axis=0)
theta_i[i] = statistic(data_minus_i)
theta_bar = theta_i.mean()
# Bias estimate
bias_jack = (n - 1) * (theta_bar - theta_hat)
# Bias-corrected estimate
theta_bc = n * theta_hat - (n - 1) * theta_bar
return bias_jack, theta_bc, theta_hat
Example 💡 Bias Correction for Variance Estimator
import numpy as np
rng = np.random.default_rng(42)
n = 30
data = rng.normal(0, 2, size=n)
# Biased variance estimator (divide by n)
def biased_var(x):
return np.var(x, ddof=0)
bias_jack, var_bc, var_biased = jackknife_bias(data, biased_var)
# Compare with unbiased variance
var_unbiased = np.var(data, ddof=1)
print(f"Biased estimate (÷n): {var_biased:.4f}")
print(f"Jackknife bias estimate: {bias_jack:.4f}")
print(f"Bias-corrected estimate: {var_bc:.4f}")
print(f"Unbiased estimate (÷n-1): {var_unbiased:.4f}")
Output:
Biased estimate (÷n): 3.6821
Jackknife bias estimate: -0.1269
Bias-corrected estimate: 3.8090
Unbiased estimate (÷n-1): 3.8090
The jackknife bias correction exactly recovers the unbiased variance estimator.
Fig. 160 Jackknife Bias Correction. Panel (a) shows the variance estimator example: the biased estimate (÷n), jackknife bias-corrected estimate, unbiased estimate (÷n−1), and true σ². The bias-corrected jackknife exactly recovers the unbiased estimator. Panel (b) demonstrates bias correction for exponential MLE, showing the positive bias and how jackknife correction approximates the theoretical bias. Panel (c) decomposes MSE into bias² and variance components, illustrating the tradeoff: bias correction reduces bias² but increases variance. Panel (d) provides guidance on when to apply bias correction.
The Delete-\(d\) Jackknife
The delete-1 jackknife removes one observation at a time. The delete-\(d\) jackknife generalizes this by removing \(d\) observations simultaneously, creating subsamples of size \(n - d\).
Motivation
Why consider \(d > 1\)?
Non-smooth statistics: For statistics like the median or quantiles, delete-1 jackknife can be unstable because removing one observation may not change the statistic (especially with ties). Larger \(d\) creates more variation.
Dependent data: When observations come in blocks (time series, spatial data), deleting entire blocks preserves the dependence structure.
Higher-order bias: Some bias correction schemes require \(d > 1\) to achieve optimal rates.
Bridge to subsampling: As \(d/n \to \gamma \in (0, 1)\), the delete-\(d\) jackknife approaches subsampling methods.
Definition
For \(d \geq 1\), let \(S \subset \{1, \ldots, n\}\) with \(|S| = d\) denote a subset of indices to remove. The delete-\(d\) estimate is:
computed on the \(n - d\) observations remaining after removing those indexed by \(S\).
There are \(\binom{n}{d}\) such subsets. For small \(d\), we can enumerate all of them; for large \(d\), we typically use a random sample of subsets.
Subset Sampling Approximation
When \(\binom{n}{d}\) is large (e.g., \(\binom{50}{5} = 2,118,760\)), we approximate the average over all subsets by random sampling. With a sufficiently large sample (e.g., 5,000–10,000 subsets), the variance of this Monte Carlo approximation is typically negligible compared to the jackknife variance itself. The estimator remains consistent as long as each observation has equal probability of being omitted.
Delete-\(d\) Variance Estimator
The delete-\(d\) jackknife variance estimator is:
where \(\bar{\theta}\) is the average over all (or sampled) delete-\(d\) estimates. The corresponding standard error is:
The factor \((n-d)/d\) scales the variance appropriately for the reduced sample size.
Scaling Intuition
For delete-1 (\(d = 1\)), the factor is \((n-1)/1 = n-1\), which when multiplied by \(1/n\) (from averaging) gives \((n-1)/n\)—matching our earlier formula.
For larger \(d\), the factor accounts for:
Subsamples have size \(n - d\) (more variability than size \(n\))
Each observation is omitted in \(\binom{n-1}{d-1}\) of the \(\binom{n}{d}\) subsets
Choosing \(d\)
The choice of \(d\) involves a bias-variance tradeoff:
Small \(d\) (including :math:`d = 1`**)**: Lower variance in the SE estimate, but may be biased for non-smooth statistics.
Large \(d\): Better for non-smooth statistics, but higher variance and computational cost.
Practical guidelines:
Smooth statistics: \(d = 1\) is sufficient and most efficient.
Quantiles/median: Consider \(d \approx \sqrt{n}\) or larger.
Block-dependent data: \(d\) = block size.
Exploration: Compare \(d = 1, \sqrt{n}, n/4\) to assess sensitivity.
Connection to Subsampling: When \(d = n - m\) for fixed \(m\) (or \(d/n \to 1\)), delete-\(d\) jackknife becomes subsampling: we compute the statistic on many subsamples of size \(m\) and use their distribution for inference. This approach is more general but requires different variance formulas.
Python Implementation
from itertools import combinations
def jackknife_delete_d(data, statistic, d=1, max_subsets=None, seed=None):
"""
Compute delete-d jackknife standard error.
Parameters
----------
data : array_like
Original data.
statistic : callable
Function that computes the statistic.
d : int
Number of observations to delete (default 1).
max_subsets : int, optional
Maximum number of subsets to use. If None, use all C(n,d) subsets
when C(n,d) <= 10000, otherwise sample 10000 random subsets.
seed : int, optional
Random seed for subset sampling.
Returns
-------
se_jack_d : float
Delete-d jackknife standard error.
theta_hat : float
Original estimate.
theta_S : ndarray
Delete-d estimates for each subset.
"""
data = np.asarray(data)
n = len(data)
if d >= n:
raise ValueError(f"d={d} must be less than n={n}")
theta_hat = statistic(data)
# Determine number of subsets
from math import comb
n_subsets = comb(n, d)
if max_subsets is None:
max_subsets = min(n_subsets, 10000)
rng = np.random.default_rng(seed)
if n_subsets <= max_subsets:
# Use all subsets
subsets = list(combinations(range(n), d))
else:
# Random sample of subsets
subsets = []
seen = set()
while len(subsets) < max_subsets:
S = tuple(sorted(rng.choice(n, d, replace=False)))
if S not in seen:
seen.add(S)
subsets.append(S)
# Compute delete-d estimates
theta_S = np.empty(len(subsets))
for j, S in enumerate(subsets):
mask = np.ones(n, dtype=bool)
mask[list(S)] = False
if data.ndim == 1:
data_minus_S = data[mask]
else:
data_minus_S = data[mask]
theta_S[j] = statistic(data_minus_S)
# Delete-d variance formula
theta_bar = theta_S.mean()
var_jack_d = ((n - d) / d) * np.mean((theta_S - theta_bar)**2)
se_jack_d = np.sqrt(var_jack_d)
return se_jack_d, theta_hat, theta_S
Example 💡 Delete-\(d\) for Median
The median is non-smooth: removing one observation may not change it. Let’s compare delete-1 and delete-\(d\) jackknife:
import numpy as np
rng = np.random.default_rng(42)
n = 40
data = rng.exponential(2.0, size=n) # Skewed data
def median_stat(x):
return np.median(x)
# Delete-1 jackknife
se_d1, theta_hat, _ = jackknife_se(data, median_stat)
# Delete-d jackknife with d = sqrt(n) ≈ 6
d = int(np.sqrt(n))
se_dd, _, _ = jackknife_delete_d(data, median_stat, d=d, seed=42)
# Bootstrap for comparison
B = 5000
boot_medians = np.array([np.median(rng.choice(data, n, replace=True))
for _ in range(B)])
se_boot = boot_medians.std(ddof=1)
print(f"Sample median: {theta_hat:.4f}")
print(f"Delete-1 jackknife SE: {se_d1:.4f}")
print(f"Delete-{d} jackknife SE: {se_dd:.4f}")
print(f"Bootstrap SE: {se_boot:.4f}")
Output:
Sample median: 1.6825
Delete-1 jackknife SE: 0.2891
Delete-6 jackknife SE: 0.3156
Bootstrap SE: 0.3284
For this non-smooth statistic, delete-\(d\) and bootstrap tend to agree better than delete-1.
Fig. 161 The Delete-d Jackknife. Panel (a) illustrates the concept: delete-1 removes single observations while delete-3 removes subsets of three. Panel (b) shows SE comparison for the sample mean (a smooth statistic)—all values of d agree closely with the classical SE. Panel (c) demonstrates SE for the median (non-smooth): delete-1 underestimates SE, while larger d moves closer to the bootstrap estimate. Panel (d) provides guidance on choosing d: use d=1 for smooth statistics, d≈√n for quantiles, and d=block size for dependent data.
Jackknife versus Bootstrap
Both jackknife and bootstrap are plug-in resampling methods targeting the same inferential goals: standard errors, bias estimates, and confidence intervals. However, they differ fundamentally in approach and have distinct strengths.
Conceptual Comparison
Aspect |
Jackknife |
Bootstrap |
|---|---|---|
Perturbation |
Remove 1 (or \(d\)) observations |
Resample entire dataset |
Samples |
\(n\) (or \(\binom{n}{d}\)) deterministic |
\(B\) stochastic |
Approximation |
First-order (linear) |
Full distribution |
Replicate values |
Pseudovalues (synthetic) |
Bootstrap statistics (actual) |
Target |
Variance, bias |
Full sampling distribution |
The jackknife approximates sampling variability through a linearization, while the bootstrap directly simulates from \(\hat{F}_n\). For smooth statistics, both approaches yield consistent variance estimates; the bootstrap additionally provides the full shape of the sampling distribution.
When to Prefer Jackknife
Smooth statistics: For means, regression coefficients, smooth M-estimators, and other differentiable functionals, jackknife and bootstrap agree, but jackknife is:
Faster: Only \(n\) computations vs. \(B \gg n\) for bootstrap
Deterministic: No Monte Carlo error, no seed management
Sufficient: SE is the primary need, not full distribution
Expensive statistics: When computing \(T\) is costly (complex optimization, large-scale computations), \(n\) jackknife evaluations may be feasible while \(B = 5000\) bootstrap evaluations are not.
Influence diagnostics: Jackknife directly provides \(\hat{\theta}_{(-i)}\) and pseudovalues, revealing which observations most affect the estimate—useful for outlier detection and robustness assessment.
BCa acceleration constant: The BCa bootstrap interval requires the acceleration constant \(a\), computed from jackknife pseudovalues.
When to Prefer Bootstrap
Non-smooth statistics: Medians, quantiles, MAD, and other non-differentiable functionals are handled better by bootstrap. The jackknife can be unstable or inconsistent for these. (Note: Trimmed means are smoother than quantiles and jackknife often performs adequately for them, though bootstrap remains more robust in small samples.)
Confidence intervals: Bootstrap provides multiple interval methods (percentile, basic, BCa, studentized) directly from the bootstrap distribution. Jackknife intervals typically rely on normal approximation: \(\hat{\theta} \pm z_{\alpha/2} \cdot \widehat{\text{SE}}_{\text{jack}}\).
Distributional shape: Bootstrap provides a histogram of \(\hat{\theta}^*\), revealing skewness, heavy tails, or multimodality that affect interval accuracy.
Hypothesis testing: Bootstrap p-values and permutation tests require resampling under the null—jackknife doesn’t provide this directly.
Small samples with complex statistics: When \(n\) is small and the statistic is complex, bootstrap’s resampling may capture features that jackknife’s linear approximation misses.
When Neither Works Well
Both nonparametric methods (jackknife and nonparametric bootstrap) struggle with:
Extreme quantiles (99th percentile, max, min): Nonparametric methods cannot exceed sample bounds. Parametric bootstrap with unbounded support can exceed these bounds and may perform better if the model is correct.
Boundary parameters: E.g., estimating \(\theta\) in Uniform(0, \(\theta\)). Both nonparametric bootstrap and jackknife are constrained by the sample maximum.
Non-identifiable parameters: Different \(F\) giving same \(T(F)\)
Highly discrete data with ties: May cause instability in both methods
For these cases, consider parametric bootstrap (when model is credible), extreme value theory, m-out-of-n bootstrap, or subsampling.
Practical Recommendation
Default strategy: Use bootstrap for general-purpose inference (it’s more flexible and provides more output). Use jackknife when:
Computation is expensive and \(T\) is smooth
You need influence diagnostics
You need deterministic, reproducible results without Monte Carlo error
You’re computing the BCa acceleration constant
Cross-validation: When feasible, compute both and compare \(\widehat{\text{SE}}_{\text{jack}}\) with \(\widehat{\text{SE}}_{\text{boot}}\). If they agree (within ~10-20%), either is trustworthy. If they disagree substantially:
For smooth statistics: investigate outliers using pseudovalues
For non-smooth statistics: trust bootstrap
Consider whether the statistic is appropriate for the data
Python Comparison
def compare_jackknife_bootstrap(data, statistic, B=5000, seed=42):
"""Compare jackknife and bootstrap standard errors."""
rng = np.random.default_rng(seed)
data = np.asarray(data)
n = len(data)
# Jackknife
se_jack, theta_hat, theta_i = jackknife_se(data, statistic)
# Bootstrap
theta_star = np.empty(B)
for b in range(B):
idx = rng.integers(0, n, n)
theta_star[b] = statistic(data[idx])
se_boot = theta_star.std(ddof=1)
return {
'theta_hat': theta_hat,
'se_jack': se_jack,
'se_boot': se_boot,
'ratio': se_jack / se_boot,
'theta_i': theta_i,
'theta_star': theta_star
}
Example 💡 Comparing Methods for Correlation
The sample correlation is smooth (away from ±1), so jackknife and bootstrap should agree:
import numpy as np
rng = np.random.default_rng(42)
n = 80
rho = 0.6
# Generate correlated data
cov = np.array([[1, rho], [rho, 1]])
xy = rng.multivariate_normal([0, 0], cov, size=n)
def correlation(data):
return np.corrcoef(data[:, 0], data[:, 1])[0, 1]
results = compare_jackknife_bootstrap(xy, correlation, B=5000, seed=42)
print(f"Sample correlation: {results['theta_hat']:.4f}")
print(f"Jackknife SE: {results['se_jack']:.4f}")
print(f"Bootstrap SE: {results['se_boot']:.4f}")
print(f"Ratio (jack/boot): {results['ratio']:.3f}")
Output:
Sample correlation: 0.5821
Jackknife SE: 0.0803
Bootstrap SE: 0.0812
Ratio (jack/boot): 0.989
The methods agree closely, as expected for a smooth statistic.
Fig. 162 Jackknife vs Bootstrap: Agreement Depends on Smoothness. Top row: For the sample mean (smooth statistic), jackknife pseudovalues (a) and bootstrap distribution (b) yield nearly identical SE estimates with ratio ≈ 1.0 (c). Bottom row: For the sample median (non-smooth), jackknife pseudovalues (d) are more concentrated than the bootstrap distribution (e), resulting in SE underestimation with ratio ≈ 0.8 (f). This illustrates the key limitation: jackknife is most reliable for smooth statistics and can be unstable for non-smooth functionals.
The Infinitesimal Jackknife
The infinitesimal jackknife (IJ) provides the theoretical foundation connecting the jackknife to influence functions. While less commonly used directly, it illuminates why the jackknife works and connects to robust statistics.
Concept
Instead of removing observations entirely, the infinitesimal jackknife considers the effect of weighting observations. If we compute \(\hat{\theta}_\mathbf{w} = T(\mathbf{w})\) where \(\mathbf{w} = (w_1, \ldots, w_n)\) weights the empirical distribution (with \(\sum_i w_i = 1\)), then the IJ examines how \(\hat{\theta}_\mathbf{w}\) changes as we perturb the weights.
For the uniform weights \(\mathbf{w}_0 = (1/n, \ldots, 1/n)\) (corresponding to \(\hat{F}_n\)), the influence of observation \(i\) is:
The IJ variance estimator is:
Connection to Delete-1 Jackknife
For smooth statistics, the delete-1 jackknife approximates the infinitesimal jackknife:
(The exact scaling constant varies by convention; some sources use \(n\) rather than \(n-1\).) This approximation improves as \(n\) grows, explaining the asymptotic equivalence of jackknife and influence-function-based variance estimation.
Practical Use
The infinitesimal jackknife is primarily of theoretical interest, but it has practical applications:
Random forests: The IJ provides variance estimates for predictions from ensemble methods where explicit leave-one-out is expensive.
Robust statistics: Understanding which observations have large \(\text{IJ}_i\) helps identify influential points.
Computational shortcuts: For some models (e.g., linear regression with hat matrix), IJ can be computed without refitting.
Fig. 163 Jackknife-After-Bootstrap: Computing the BCa Acceleration Constant. Panel (a) shows the jackknife influence values \(\bar{\theta}_{(\cdot)} - \hat{\theta}_{(-i)}\) for each observation, revealing the skewness that the acceleration constant captures. Panel (b) derives the acceleration constant formula from these influence values, showing numerical computation. Panel (c) displays the bootstrap distribution with both percentile CI (shaded region) and BCa CI (dashed lines), demonstrating how the acceleration adjustment shifts the interval to improve coverage. Panel (d) summarizes how jackknife enables better bootstrap intervals by quantifying the SE-parameter relationship.
Practical Considerations
Computational Efficiency
Vectorization: For simple statistics, vectorize across leave-one-out samples:
def jackknife_mean_fast(data):
"""Fast jackknife SE for the mean using vectorization."""
data = np.asarray(data)
n = len(data)
total = data.sum()
# Leave-one-out means: (total - x_i) / (n - 1)
theta_i = (total - data) / (n - 1)
theta_bar = theta_i.mean()
se_jack = np.sqrt(((n - 1) / n) * np.sum((theta_i - theta_bar)**2))
return se_jack, total / n, theta_i
Regression shortcuts: For OLS, leave-one-out residuals can be computed via the hat matrix without refitting:
where \(h_{ii}\) is the \(i\)-th diagonal element of \(H = X(X^\top X)^{-1}X^\top\).
Parallel computation: For expensive statistics, distribute leave-one-out computations across cores.
Diagnostics
The jackknife provides unique diagnostic capabilities:
Influential observations: Large \(|\hat{\theta} - \hat{\theta}_{(-i)}|\) indicates observation \(i\) strongly affects the estimate.
Pseudovalue outliers: Pseudovalues far from \(\hat{\theta}\) suggest potential data problems or model issues.
Stability assessment: Plot \(\hat{\theta}_{(-i)}\) vs. \(i\) to visualize how the estimate changes as each observation is removed.
import matplotlib.pyplot as plt
def jackknife_diagnostics(theta_hat, theta_i, figsize=(12, 4)):
"""
Create diagnostic plots for jackknife analysis.
"""
n = len(theta_i)
pv = n * theta_hat - (n - 1) * theta_i
influence = theta_hat - theta_i
fig, axes = plt.subplots(1, 3, figsize=figsize)
# Leave-one-out estimates
ax = axes[0]
ax.plot(range(n), theta_i, 'o', alpha=0.6)
ax.axhline(theta_hat, color='red', linestyle='--', label=r'$\hat{\theta}$')
ax.set_xlabel('Observation index')
ax.set_ylabel(r'$\hat{\theta}_{(-i)}$')
ax.set_title('Leave-one-out estimates')
ax.legend()
# Influence
ax = axes[1]
ax.stem(range(n), influence, basefmt=' ')
ax.axhline(0, color='gray', linestyle='-')
ax.set_xlabel('Observation index')
ax.set_ylabel(r'$\hat{\theta} - \hat{\theta}_{(-i)}$')
ax.set_title('Influence of each observation')
# Pseudovalues
ax = axes[2]
ax.hist(pv, bins=20, edgecolor='white', alpha=0.7)
ax.axvline(theta_hat, color='red', linestyle='--', label=r'$\hat{\theta}$')
ax.set_xlabel('Pseudovalue')
ax.set_ylabel('Frequency')
ax.set_title('Pseudovalue distribution')
ax.legend()
plt.tight_layout()
return fig
Fig. 164 Jackknife Influence Diagnostics. Panel (a) shows data with an outlier clearly visible. Panel (b) plots leave-one-out estimates—the outlier’s removal causes the largest change in \(\hat{\theta}\). Panel (c) displays influence \(\hat{\theta} - \hat{\theta}_{(-i)}\) for each observation; the outlier has by far the largest influence. Panel (d) shows the pseudovalue distribution; the outlier’s pseudovalue is an extreme outlier. Panel (e) compares jackknife SE with and without the outlier. Panel (f) provides guidance on using jackknife for diagnostics: flag observations with influence exceeding 2×SE or pseudovalues beyond 3σ.
Common Pitfall ⚠️
Jackknife for non-smooth statistics: The jackknife can fail dramatically for non-smooth statistics. For the sample median, if several observations equal the median, removing one may not change \(\hat{\theta}_{(-i)}\) at all, leading to underestimated SE. Always verify with bootstrap for non-smooth functionals.
Bringing It All Together
The jackknife is a deterministic resampling method that provides fast, reliable inference for smooth statistics. Its leave-one-out structure naturally connects to influence functions, providing both variance estimates and diagnostic insights about observation influence.
Key decision points:
Is the statistic smooth? If yes, jackknife is efficient and reliable. If no (quantiles, modes, maxima), prefer bootstrap or delete-\(d\) jackknife.
Is computation expensive? If yes and \(T\) is smooth, jackknife’s \(n\) evaluations may be preferable to bootstrap’s \(B \gg n\) evaluations.
Do you need more than SE? Confidence intervals, hypothesis tests, and distributional shape are better served by bootstrap.
Do you need diagnostics? Jackknife’s leave-one-out structure provides unique insight into observation influence.
Fig. 165 Method Selection Guide. This decision flowchart guides the choice between jackknife and bootstrap. The key questions are: (1) Is the statistic smooth? If no, use bootstrap. (2) Is computation expensive? If yes and smooth, jackknife is preferable. (3) Do you need CI or full distribution? If yes, use bootstrap. (4) Do you only need SE? Either method works, but jackknife is faster. Best practice: compute both as a sensitivity check when feasible.
The jackknife-bootstrap relationship: These methods are complementary. Use jackknife for efficiency and diagnostics, bootstrap for flexibility and intervals. When both are feasible, computing both provides a robustness check—agreement confirms reliability, disagreement warrants investigation.
Fig. 166 Jackknife vs Bootstrap: Complete Comparison. This summary table compares all key aspects of the two methods. Jackknife excels for smooth statistics, expensive computations, reproducibility, and influence diagnostics. Bootstrap excels for non-smooth statistics, confidence intervals, and obtaining full distributional information. The BCa acceleration constant bridges both methods—computed from jackknife but used to improve bootstrap intervals.
Key Takeaways 📝
Core method: The delete-1 jackknife computes leave-one-out estimates \(\hat{\theta}_{(-i)}\) and derives SE from their variability using \(\widehat{\text{SE}}_{\text{jack}} = \sqrt{((n-1)/n) \sum (\hat{\theta}_{(-i)} - \bar{\theta}_{(\cdot)})^2}\).
Bias correction: Jackknife bias estimate is \((n-1)(\bar{\theta}_{(\cdot)} - \hat{\theta})\); bias-corrected estimate is the mean of pseudovalues.
Delete-\(d\) extension: For non-smooth statistics or block-dependent data, delete-\(d\) with \(d > 1\) may improve reliability.
Comparison with bootstrap: Jackknife is faster and deterministic for smooth statistics; bootstrap is more flexible and provides full distributional information.
Outcome alignment: This section addresses LO 3 (implement resampling for variability assessment) and connects to the influence function theory underlying robust statistics.
Exercises
These exercises develop your understanding of jackknife methods from basic implementation through comparison with bootstrap and recognition of failure modes.
Exercise 4.5.1: Jackknife SE Equals Classical SE for the Mean
Prove algebraically that for the sample mean \(\bar{X}\), the jackknife SE exactly equals the classical formula \(s/\sqrt{n}\).
Background: Why This Matters
This identity verifies that the jackknife correctly recovers known results for simple statistics. It also provides insight into the \((n-1)/n\) factor in the jackknife formula.
Show that for \(\hat{\theta} = \bar{X}\), the leave-one-out estimate is \(\bar{X}_{(-i)} = \frac{n\bar{X} - X_i}{n-1}\).
Compute \(\bar{X}_{(-i)} - \bar{\theta}_{(\cdot)}\) where \(\bar{\theta}_{(\cdot)} = \frac{1}{n}\sum_{i=1}^n \bar{X}_{(-i)}\).
Substitute into the jackknife SE formula and simplify to show \(\widehat{\text{SE}}_{\text{jack}} = s/\sqrt{n}\).
Verify numerically with simulated data.
Hint
For part (b), show that \(\bar{\theta}_{(\cdot)} = \bar{X}\) by expanding the definition. Then \(\bar{X}_{(-i)} - \bar{X} = -\frac{X_i - \bar{X}}{n-1}\).
Solution
Part (a): Leave-one-out mean
Step 1: Derive the formula
The sum of all observations is \(\sum_{j=1}^n X_j = n\bar{X}\). Removing \(X_i\) gives sum \(n\bar{X} - X_i\) over \(n-1\) observations.
Part (b): Deviation from jackknife average
First, compute \(\bar{\theta}_{(\cdot)}\):
Therefore:
Part (c): Jackknife SE formula
Taking square root: \(\widehat{\text{SE}}_{\text{jack}} = s/\sqrt{n}\). ∎
Part (d): Numerical verification
import numpy as np
rng = np.random.default_rng(42)
data = rng.normal(5, 2, size=100)
# Jackknife
n = len(data)
total = data.sum()
theta_i = (total - data) / (n - 1)
theta_bar = theta_i.mean()
se_jack = np.sqrt(((n-1)/n) * np.sum((theta_i - theta_bar)**2))
# Classical
se_classical = data.std(ddof=1) / np.sqrt(n)
print(f"Jackknife SE: {se_jack:.6f}")
print(f"Classical SE: {se_classical:.6f}")
print(f"Difference: {abs(se_jack - se_classical):.2e}")
Output:
Jackknife SE: 0.192749
Classical SE: 0.192749
Difference: 0.00e+00
Key Insights:
The jackknife exactly recovers the classical SE for the mean—this is a mathematical identity, not an approximation.
The \((n-1)/n\) factor in the jackknife formula precisely compensates for the \(1/(n-1)^2\) from the leave-one-out deviations.
This identity provides confidence that the jackknife formula is correctly calibrated.
Exercise 4.5.2: Jackknife Bias Correction
Investigate jackknife bias correction for the biased variance estimator and an exponential mean estimator.
Background: Bias of Order 1/n
Many common estimators have bias of order \(O(1/n)\). The jackknife can detect and correct such bias. When bias is \(O(1/n^2)\) or smaller, jackknife bias correction is unnecessary and may increase variance.
For the biased variance estimator \(\hat{\sigma}^2 = \frac{1}{n}\sum_i(X_i - \bar{X})^2\), verify that jackknife bias correction recovers the unbiased estimator \(s^2\).
Generate \(n = 30\) observations from Exponential(\(\lambda = 2\)). The MLE for \(\lambda\) is \(\hat{\lambda} = 1/\bar{X}\), which has positive bias. Compute the jackknife bias estimate and compare with the theoretical bias \(\lambda/(n-1)\).
For the exponential MLE, compare the MSE of \(\hat{\lambda}\) and \(\hat{\lambda}^{\text{jack-bc}}\) via simulation.
Discuss when bias correction is worthwhile.
Hint
The MLE \(\hat{\lambda} = 1/\bar{X}\) is biased because \(\mathbb{E}[1/\bar{X}] \neq 1/\mathbb{E}[\bar{X}]\) (Jensen’s inequality). The exact bias formula for the exponential is \(\mathbb{E}[\hat{\lambda}] = \lambda n/(n-1)\), giving bias \(\lambda/(n-1)\).
Solution
Part (a): Variance estimator
import numpy as np
rng = np.random.default_rng(42)
data = rng.normal(0, 2, size=30)
def biased_var(x):
return np.var(x, ddof=0)
# Jackknife bias correction
n = len(data)
theta_hat = biased_var(data)
theta_i = np.array([biased_var(np.delete(data, i)) for i in range(n)])
theta_bar = theta_i.mean()
bias_jack = (n - 1) * (theta_bar - theta_hat)
theta_bc = theta_hat - bias_jack
print(f"Biased estimate (÷n): {theta_hat:.4f}")
print(f"Jackknife bias: {bias_jack:.4f}")
print(f"Bias-corrected estimate: {theta_bc:.4f}")
print(f"Unbiased estimate (÷n-1): {np.var(data, ddof=1):.4f}")
Output:
Biased estimate (÷n): 3.6821
Jackknife bias: -0.1269
Bias-corrected estimate: 3.8090
Unbiased estimate (÷n-1): 3.8090
The jackknife exactly recovers the unbiased variance estimator.
Part (b): Exponential MLE bias
rng = np.random.default_rng(42)
n = 30
lambda_true = 2.0
data = rng.exponential(1/lambda_true, size=n)
def exp_mle(x):
return 1 / x.mean()
lambda_hat = exp_mle(data)
theta_i = np.array([exp_mle(np.delete(data, i)) for i in range(n)])
theta_bar = theta_i.mean()
bias_jack = (n - 1) * (theta_bar - lambda_hat)
bias_theory = lambda_true / (n - 1) # Theoretical bias
print(f"MLE λ̂: {lambda_hat:.4f}")
print(f"Jackknife bias: {bias_jack:.4f}")
print(f"Theoretical bias: {bias_theory:.4f}")
print(f"(using true λ={lambda_true})")
Output:
MLE λ̂: 2.2118
Jackknife bias: 0.0765
Theoretical bias: 0.0690
(using true λ=2.0)
The jackknife bias estimate is close to the theoretical value.
Part (c): MSE comparison via simulation
n_sims = 5000
n = 30
lambda_true = 2.0
mle_estimates = []
bc_estimates = []
rng = np.random.default_rng(123)
for _ in range(n_sims):
data = rng.exponential(1/lambda_true, size=n)
lambda_hat = 1 / data.mean()
# Jackknife bias correction
theta_i = np.array([1/np.delete(data, i).mean() for i in range(n)])
theta_bar = theta_i.mean()
bias_jack = (n - 1) * (theta_bar - lambda_hat)
lambda_bc = lambda_hat - bias_jack
mle_estimates.append(lambda_hat)
bc_estimates.append(lambda_bc)
mle_estimates = np.array(mle_estimates)
bc_estimates = np.array(bc_estimates)
# MSE decomposition
bias_mle = mle_estimates.mean() - lambda_true
var_mle = mle_estimates.var(ddof=1)
mse_mle = bias_mle**2 + var_mle
bias_bc = bc_estimates.mean() - lambda_true
var_bc = bc_estimates.var(ddof=1)
mse_bc = bias_bc**2 + var_bc
print("MLE estimates:")
print(f" Bias: {bias_mle:.4f}, Variance: {var_mle:.4f}, MSE: {mse_mle:.4f}")
print("\nBias-corrected estimates:")
print(f" Bias: {bias_bc:.4f}, Variance: {var_bc:.4f}, MSE: {mse_bc:.4f}")
print(f"\nMSE ratio (BC/MLE): {mse_bc/mse_mle:.3f}")
Output:
MLE estimates:
Bias: 0.0689, Variance: 0.0563, MSE: 0.0611
Bias-corrected estimates:
Bias: -0.0005, Variance: 0.0637, MSE: 0.0637
MSE ratio (BC/MLE): 1.042
Part (d): Discussion
Key Insights:
Bias correction successfully removes bias (from 0.069 to ~0).
However, variance increases (from 0.056 to 0.064).
MSE is actually higher for the bias-corrected estimator in this case.
When to use bias correction:
When bias >> SE (bias correction helps)
When n is large (variance inflation is smaller)
When unbiasedness is required (regulatory, comparison purposes)
When to avoid:
When bias << SE (correction just adds noise)
When MSE is the primary criterion
For small n with moderate bias
Exercise 4.5.3: Jackknife vs Bootstrap Comparison
Compare jackknife and bootstrap for smooth and non-smooth statistics.
Background: Method Agreement
When both methods agree, we have confidence in the result. When they disagree, the disagreement itself is informative—typically indicating non-smoothness or influential observations.
Generate \(n = 50\) observations from a skewed distribution (e.g., Exponential(1)). Compare jackknife and bootstrap SE for the sample mean.
Using the same data, compare jackknife and bootstrap SE for the sample median.
Generate data with one outlier (e.g., 49 observations from N(0,1) plus one observation at 10). Compare methods for the mean and the 20% trimmed mean.
Summarize your findings about when the methods agree/disagree.
Hint
For the trimmed mean, use scipy.stats.trim_mean(x, 0.2) which removes the smallest and largest 20% of observations before computing the mean.
Solution
Part (a): Sample mean (smooth)
import numpy as np
from scipy import stats
rng = np.random.default_rng(42)
n = 50
data = rng.exponential(1.0, size=n)
# Jackknife SE for mean
theta_i = np.array([(data.sum() - data[i])/(n-1) for i in range(n)])
theta_bar = theta_i.mean()
se_jack = np.sqrt(((n-1)/n) * np.sum((theta_i - theta_bar)**2))
# Bootstrap SE for mean
B = 5000
boot_means = np.array([rng.choice(data, n, replace=True).mean() for _ in range(B)])
se_boot = boot_means.std(ddof=1)
print("Sample Mean (smooth statistic):")
print(f" Jackknife SE: {se_jack:.4f}")
print(f" Bootstrap SE: {se_boot:.4f}")
print(f" Ratio: {se_jack/se_boot:.3f}")
Output:
Sample Mean (smooth statistic):
Jackknife SE: 0.1298
Bootstrap SE: 0.1316
Ratio: 0.986
Part (b): Sample median (non-smooth)
# Jackknife SE for median
theta_i = np.array([np.median(np.delete(data, i)) for i in range(n)])
theta_bar = theta_i.mean()
se_jack_med = np.sqrt(((n-1)/n) * np.sum((theta_i - theta_bar)**2))
# Bootstrap SE for median
boot_medians = np.array([np.median(rng.choice(data, n, replace=True)) for _ in range(B)])
se_boot_med = boot_medians.std(ddof=1)
print("\nSample Median (non-smooth statistic):")
print(f" Jackknife SE: {se_jack_med:.4f}")
print(f" Bootstrap SE: {se_boot_med:.4f}")
print(f" Ratio: {se_jack_med/se_boot_med:.3f}")
Output:
Sample Median (non-smooth statistic):
Jackknife SE: 0.1142
Bootstrap SE: 0.1387
Ratio: 0.823
The jackknife underestimates SE for the median.
Part (c): Data with outlier
# Data with outlier
data_outlier = np.concatenate([rng.normal(0, 1, size=49), [10.0]])
n_out = len(data_outlier)
def trimmed_mean(x):
return stats.trim_mean(x, 0.2)
# Mean - jackknife vs bootstrap
theta_i_mean = np.array([np.delete(data_outlier, i).mean() for i in range(n_out)])
se_jack_mean = np.sqrt(((n_out-1)/n_out) * np.var(theta_i_mean, ddof=0) * n_out)
boot_means_out = np.array([rng.choice(data_outlier, n_out, replace=True).mean() for _ in range(B)])
se_boot_mean_out = boot_means_out.std(ddof=1)
# Trimmed mean - jackknife vs bootstrap
theta_i_trim = np.array([trimmed_mean(np.delete(data_outlier, i)) for i in range(n_out)])
se_jack_trim = np.sqrt(((n_out-1)/n_out) * np.var(theta_i_trim, ddof=0) * n_out)
boot_trim = np.array([trimmed_mean(rng.choice(data_outlier, n_out, replace=True)) for _ in range(B)])
se_boot_trim = boot_trim.std(ddof=1)
print("\nData with outlier (49 N(0,1) + one value at 10):")
print(f" Mean - Jackknife SE: {se_jack_mean:.4f}, Bootstrap SE: {se_boot_mean_out:.4f}, Ratio: {se_jack_mean/se_boot_mean_out:.3f}")
print(f" Trim - Jackknife SE: {se_jack_trim:.4f}, Bootstrap SE: {se_boot_trim:.4f}, Ratio: {se_jack_trim/se_boot_trim:.3f}")
Output:
Data with outlier (49 N(0,1) + one value at 10):
Mean - Jackknife SE: 0.2162, Bootstrap SE: 0.2198, Ratio: 0.984
Trim - Jackknife SE: 0.1438, Bootstrap SE: 0.1501, Ratio: 0.958
Part (d): Summary
Key Insights:
Smooth statistics (mean): Methods agree closely (ratio ≈ 0.98-0.99) regardless of data distribution or outliers.
Non-smooth statistics (median): Jackknife tends to underestimate SE (ratio ≈ 0.82). This is because removing one observation often doesn’t change the median, so jackknife misses some variability.
Robust statistics (trimmed mean): Good agreement even with outliers, since the trimmed mean is smooth within its trimming range.
Rule of thumb: - Ratio ≈ 0.9-1.1: methods agree, use either (jackknife is faster) - Ratio < 0.85: likely non-smooth statistic, trust bootstrap - Ratio > 1.15: unusual, investigate data/computation
Exercise 4.5.4: Delete-\(d\) Jackknife
Explore the delete-\(d\) jackknife for different values of \(d\).
Background: Choosing d
For smooth statistics, \(d = 1\) is optimal. For non-smooth statistics like quantiles, larger \(d\) can improve reliability by creating more variation in the leave-\(d\)-out estimates.
For a sample of \(n = 50\) from N(0,1), compute delete-\(d\) jackknife SE for the mean with \(d = 1, 3, 5, 10\). How do they compare?
Repeat for the sample median. Does larger \(d\) help?
For the 90th percentile, compare \(d = 1\) and \(d = \lfloor\sqrt{n}\rfloor = 7\).
Discuss computational considerations for large \(d\).
Hint
For \(d = 10\) with \(n = 50\), there are \(\binom{50}{10} \approx 10^{10}\) subsets—too many to enumerate. Use random sampling of subsets.
Solution
Part (a): Mean with various d
import numpy as np
from itertools import combinations
def jackknife_delete_d_se(data, statistic, d, max_subsets=5000, seed=42):
rng = np.random.default_rng(seed)
data = np.asarray(data)
n = len(data)
from math import comb
n_subsets = comb(n, d)
if n_subsets <= max_subsets:
subsets = list(combinations(range(n), d))
else:
subsets = []
seen = set()
while len(subsets) < max_subsets:
S = tuple(sorted(rng.choice(n, d, replace=False)))
if S not in seen:
seen.add(S)
subsets.append(S)
theta_S = []
for S in subsets:
mask = np.ones(n, dtype=bool)
mask[list(S)] = False
theta_S.append(statistic(data[mask]))
theta_S = np.array(theta_S)
theta_bar = theta_S.mean()
var_jack_d = ((n - d) / d) * np.mean((theta_S - theta_bar)**2)
return np.sqrt(var_jack_d)
rng = np.random.default_rng(42)
n = 50
data = rng.normal(0, 1, size=n)
print("Delete-d jackknife SE for the Mean:")
for d in [1, 3, 5, 10]:
se_d = jackknife_delete_d_se(data, np.mean, d)
print(f" d = {d:2d}: SE = {se_d:.4f}")
print(f"\nClassical SE: {data.std(ddof=1)/np.sqrt(n):.4f}")
Output:
Delete-d jackknife SE for the Mean:
d = 1: SE = 0.1375
d = 3: SE = 0.1372
d = 5: SE = 0.1369
d = 10: SE = 0.1359
Classical SE: 0.1375
For the smooth mean, all values of \(d\) give similar results.
Part (b): Median with various d
print("\nDelete-d jackknife SE for the Median:")
for d in [1, 3, 5, 10]:
se_d = jackknife_delete_d_se(data, np.median, d)
print(f" d = {d:2d}: SE = {se_d:.4f}")
# Bootstrap comparison
B = 5000
boot_med = np.array([np.median(rng.choice(data, n, replace=True)) for _ in range(B)])
print(f"\nBootstrap SE: {boot_med.std(ddof=1):.4f}")
Output:
Delete-d jackknife SE for the Median:
d = 1: SE = 0.1435
d = 3: SE = 0.1582
d = 5: SE = 0.1664
d = 10: SE = 0.1759
Bootstrap SE: 0.1821
Larger \(d\) moves the jackknife SE closer to the bootstrap SE for the median.
Part (c): 90th percentile
def p90(x):
return np.percentile(x, 90)
print("\nDelete-d jackknife SE for 90th Percentile:")
se_d1 = jackknife_delete_d_se(data, p90, d=1)
se_d7 = jackknife_delete_d_se(data, p90, d=7)
boot_p90 = np.array([p90(rng.choice(data, n, replace=True)) for _ in range(B)])
se_boot_p90 = boot_p90.std(ddof=1)
print(f" d = 1: SE = {se_d1:.4f}")
print(f" d = 7: SE = {se_d7:.4f}")
print(f" Bootstrap SE: {se_boot_p90:.4f}")
Output:
Delete-d jackknife SE for 90th Percentile:
d = 1: SE = 0.2156
d = 7: SE = 0.2589
Bootstrap SE: 0.2873
Part (d): Computational considerations
Key Insights:
Smooth statistics (mean): All \(d\) values give similar SE; \(d = 1\) is most efficient.
Non-smooth statistics (median, quantiles): Larger \(d\) moves jackknife SE toward bootstrap SE, but still underestimates. Delete-\(d\) helps but doesn’t fully solve the non-smoothness problem.
Computational cost: - \(d = 1\): Exactly \(n\) subsets, deterministic - \(d = \sqrt{n}\): \(\binom{50}{7} \approx 99$ million—need sampling - :math:`d = n/2\): \(\binom{50}{25} \approx 10^{14}\)—definitely sampling
Recommendation: For non-smooth statistics, bootstrap is simpler and more reliable than delete-\(d\) jackknife.
Exercise 4.5.5: Jackknife Diagnostics
Use jackknife diagnostics to identify influential observations.
Background: Influence Analysis
The jackknife naturally provides influence diagnostics. Observations with large \(|\hat{\theta} - \hat{\theta}_{(-i)}|\) or extreme pseudovalues strongly affect the estimate and warrant investigation.
Generate \(n = 50\) observations from N(0,1), then replace the last observation with 5 (an outlier). Compute the sample mean and identify which observation is most influential using jackknife.
Repeat for robust statistics: median and 20% trimmed mean. Is the outlier still the most influential?
For the OLS slope in simple linear regression with one high-leverage point, compare jackknife influence with Cook’s distance.
Create a visualization of jackknife diagnostics (leave-one-out estimates, influence plot, pseudovalue histogram).
Hint
Cook’s distance for observation \(i\) is \(D_i = \frac{e_i^2}{p \cdot \hat{\sigma}^2} \cdot \frac{h_{ii}}{(1-h_{ii})^2}\) where \(h_{ii}\) is leverage. High leverage + large residual = large influence.
Solution
Part (a): Mean with outlier
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(42)
n = 50
data = rng.normal(0, 1, size=n)
data[-1] = 5.0 # Outlier
# Jackknife for mean
theta_hat = data.mean()
theta_i = np.array([(data.sum() - data[i])/(n-1) for i in range(n)])
influence = theta_hat - theta_i
print("Mean with outlier:")
print(f" Sample mean: {theta_hat:.4f}")
print(f" Most influential observation: index {np.argmax(np.abs(influence))}")
print(f" Its value: {data[np.argmax(np.abs(influence))]:.2f}")
print(f" Influence: {influence[np.argmax(np.abs(influence))]:.4f}")
print(f" Mean influence (absolute): {np.abs(influence).mean():.4f}")
Output:
Mean with outlier:
Sample mean: 0.0981
Most influential observation: index 49
Its value: 5.00
Influence: 0.0980
Mean influence (absolute): 0.0122
Part (b): Robust statistics
from scipy import stats
# Median
med_hat = np.median(data)
med_i = np.array([np.median(np.delete(data, i)) for i in range(n)])
med_influence = med_hat - med_i
most_inf_med = np.argmax(np.abs(med_influence))
# Trimmed mean
trim_hat = stats.trim_mean(data, 0.2)
trim_i = np.array([stats.trim_mean(np.delete(data, i), 0.2) for i in range(n)])
trim_influence = trim_hat - trim_i
most_inf_trim = np.argmax(np.abs(trim_influence))
print("\nMedian:")
print(f" Most influential: index {most_inf_med}, value {data[most_inf_med]:.2f}")
print(f" Outlier influence: {med_influence[-1]:.4f}")
print("\nTrimmed Mean (20%):")
print(f" Most influential: index {most_inf_trim}, value {data[most_inf_trim]:.2f}")
print(f" Outlier influence: {trim_influence[-1]:.4f}")
Output:
Median:
Most influential: index 25, value -0.08
Outlier influence: 0.0000
Trimmed Mean (20%):
Most influential: index 32, value -1.26
Outlier influence: 0.0000
The outlier has zero influence on median and trimmed mean—they are robust!
Part (c): Regression with high leverage
# Generate regression data
x = rng.uniform(0, 10, size=n-1)
x = np.append(x, 20) # High leverage point
y = 2 + 0.5*x + rng.normal(0, 1, size=n)
y[-1] = 2 + 0.5*20 + 5 # Also large residual
def ols_slope(x, y):
X = np.column_stack([np.ones(len(x)), x])
beta = np.linalg.lstsq(X, y, rcond=None)[0]
return beta[1]
# Jackknife
slope_hat = ols_slope(x, y)
slope_i = np.array([ols_slope(np.delete(x, i), np.delete(y, i)) for i in range(n)])
slope_influence = slope_hat - slope_i
# Cook's distance
X = np.column_stack([np.ones(n), x])
H = X @ np.linalg.inv(X.T @ X) @ X.T
h = np.diag(H)
residuals = y - X @ np.linalg.lstsq(X, y, rcond=None)[0]
sigma_sq = np.sum(residuals**2) / (n - 2)
p = 2 # number of parameters
cooks_d = (residuals**2 / (p * sigma_sq)) * (h / (1 - h)**2)
print("Regression with high-leverage outlier:")
print(f" Most influential (jackknife): index {np.argmax(np.abs(slope_influence))}")
print(f" Most influential (Cook's D): index {np.argmax(cooks_d)}")
print(f" Correlation (influence, Cook's D): {np.corrcoef(np.abs(slope_influence), cooks_d)[0,1]:.3f}")
Output:
Regression with high-leverage outlier:
Most influential (jackknife): index 49
Most influential (Cook's D): index 49
Correlation (influence, Cook's D): 0.987
Part (d): Diagnostic visualization
fig, axes = plt.subplots(1, 3, figsize=(12, 4))
# Original data with outlier
data_diag = rng.normal(0, 1, size=50)
data_diag[-1] = 5.0
theta_hat = data_diag.mean()
theta_i = np.array([(data_diag.sum() - data_diag[i])/(n-1) for i in range(n)])
pv = n * theta_hat - (n-1) * theta_i
influence = theta_hat - theta_i
ax = axes[0]
ax.plot(range(n), theta_i, 'o', alpha=0.6)
ax.axhline(theta_hat, color='red', linestyle='--', label=r'$\hat{\theta}$')
ax.set_xlabel('Observation index')
ax.set_ylabel(r'$\hat{\theta}_{(-i)}$')
ax.set_title('Leave-one-out estimates')
ax = axes[1]
ax.stem(range(n), influence, basefmt=' ')
ax.axhline(0, color='gray')
ax.scatter([49], [influence[49]], color='red', s=100, zorder=5)
ax.set_xlabel('Observation index')
ax.set_ylabel(r'$\hat{\theta} - \hat{\theta}_{(-i)}$')
ax.set_title('Influence (outlier in red)')
ax = axes[2]
ax.hist(pv, bins=15, edgecolor='white', alpha=0.7)
ax.axvline(pv[-1], color='red', linewidth=2, label='Outlier PV')
ax.set_xlabel('Pseudovalue')
ax.set_ylabel('Frequency')
ax.set_title('Pseudovalue distribution')
ax.legend()
plt.tight_layout()
plt.savefig('jackknife_diagnostics.png', dpi=150)
Key Insights:
Mean is not robust: The outlier dominates jackknife influence for the mean.
Robust statistics shield outliers: Median and trimmed mean show zero influence from the outlier—they effectively ignore it.
Jackknife ≈ Cook’s D: For regression, jackknife influence is highly correlated with Cook’s distance (both measure parameter change from deletion).
Diagnostics are valuable: Plotting leave-one-out estimates and pseudovalues quickly reveals influential observations.
Exercise 4.5.6: Complete Jackknife Analysis
Perform a complete jackknife analysis on a real-world style dataset.
Background: Integrated Workflow
A complete analysis includes: computing the estimate and SE, checking for bias, identifying influential observations, comparing with bootstrap, and reporting results appropriately.
Generate bivariate data simulating a study of the relationship between study hours and exam scores:
rng = np.random.default_rng(42)
n = 40
study_hours = rng.uniform(1, 10, size=n)
exam_score = 50 + 4*study_hours + rng.normal(0, 8, size=n)
# Add one overachiever outlier
study_hours = np.append(study_hours, 3)
exam_score = np.append(exam_score, 95) # High score for low hours
Compute the OLS slope and its jackknife SE. Compare with classical SE.
Compute jackknife bias for the slope. Is bias correction warranted?
Identify the most influential observation. Is it the outlier?
Compare jackknife SE with bootstrap SE (both pairs and residual bootstrap).
Write a brief results paragraph suitable for a methods section.
Solution
Setup
import numpy as np
from scipy import stats
rng = np.random.default_rng(42)
n = 40
study_hours = rng.uniform(1, 10, size=n)
exam_score = 50 + 4*study_hours + rng.normal(0, 8, size=n)
study_hours = np.append(study_hours, 3)
exam_score = np.append(exam_score, 95)
n = len(study_hours)
X = np.column_stack([np.ones(n), study_hours])
def ols_coefs(X, y):
return np.linalg.lstsq(X, y, rcond=None)[0]
Part (a): Jackknife SE vs Classical SE
# Full data estimate
beta_hat = ols_coefs(X, exam_score)
y_hat = X @ beta_hat
residuals = exam_score - y_hat
sigma_hat = np.sqrt(np.sum(residuals**2) / (n - 2))
# Classical SE
XtX_inv = np.linalg.inv(X.T @ X)
se_classical = sigma_hat * np.sqrt(XtX_inv[1, 1])
# Jackknife SE
beta_i = np.empty((n, 2))
for i in range(n):
X_i = np.delete(X, i, axis=0)
y_i = np.delete(exam_score, i)
beta_i[i] = ols_coefs(X_i, y_i)
beta_bar = beta_i.mean(axis=0)
se_jack = np.sqrt(((n-1)/n) * np.sum((beta_i[:, 1] - beta_bar[1])**2))
print("Part (a): Slope SE comparison")
print(f" Slope estimate: {beta_hat[1]:.4f}")
print(f" Classical SE: {se_classical:.4f}")
print(f" Jackknife SE: {se_jack:.4f}")
print(f" Ratio: {se_jack/se_classical:.3f}")
Output:
Part (a): Slope SE comparison
Slope estimate: 3.4152
Classical SE: 0.4893
Jackknife SE: 0.5037
Ratio: 1.029
Part (b): Bias estimation
bias_jack = (n - 1) * (beta_bar[1] - beta_hat[1])
beta_bc = beta_hat[1] - bias_jack
print("\nPart (b): Bias estimation")
print(f" Jackknife bias: {bias_jack:.4f}")
print(f" Bias / SE ratio: {abs(bias_jack)/se_jack:.3f}")
print(f" Bias-corrected slope: {beta_bc:.4f}")
print(" Conclusion: Bias is small relative to SE; correction not warranted.")
Output:
Part (b): Bias estimation
Jackknife bias: 0.0312
Bias / SE ratio: 0.062
Bias-corrected slope: 3.3839
Conclusion: Bias is small relative to SE; correction not warranted.
Part (c): Influential observations
influence = beta_hat[1] - beta_i[:, 1]
most_influential = np.argmax(np.abs(influence))
print("\nPart (c): Influential observations")
print(f" Most influential: index {most_influential}")
print(f" Study hours: {study_hours[most_influential]:.2f}")
print(f" Exam score: {exam_score[most_influential]:.2f}")
print(f" Influence on slope: {influence[most_influential]:.4f}")
print(f" Outlier index: {n-1}, Outlier influence: {influence[-1]:.4f}")
Output:
Part (c): Influential observations
Most influential: index 40
Study hours: 3.00
Exam score: 95.00
Influence on slope: -0.1127
Outlier index: 40, Outlier influence: -0.1127
The outlier (index 40) is the most influential observation.
Part (d): Bootstrap comparison
B = 5000
# Pairs bootstrap
data = np.column_stack([study_hours, exam_score])
boot_slopes_pairs = np.empty(B)
for b in range(B):
idx = rng.integers(0, n, n)
X_b = np.column_stack([np.ones(n), data[idx, 0]])
boot_slopes_pairs[b] = ols_coefs(X_b, data[idx, 1])[1]
se_boot_pairs = boot_slopes_pairs.std(ddof=1)
# Residual bootstrap
resid_centered = residuals - residuals.mean()
boot_slopes_resid = np.empty(B)
for b in range(B):
idx = rng.integers(0, n, n)
y_star = y_hat + resid_centered[idx]
boot_slopes_resid[b] = ols_coefs(X, y_star)[1]
se_boot_resid = boot_slopes_resid.std(ddof=1)
print("\nPart (d): Bootstrap comparison")
print(f" Jackknife SE: {se_jack:.4f}")
print(f" Pairs bootstrap SE: {se_boot_pairs:.4f}")
print(f" Residual bootstrap SE:{se_boot_resid:.4f}")
print(f" Classical SE: {se_classical:.4f}")
Output:
Part (d): Bootstrap comparison
Jackknife SE: 0.5037
Pairs bootstrap SE: 0.5156
Residual bootstrap SE:0.4812
Classical SE: 0.4893
Part (e): Results paragraph
We estimated the relationship between study hours and exam scores using ordinary least squares regression. The estimated slope was β̂₁ = 3.42, indicating that each additional hour of study is associated with a 3.42-point increase in exam score. Standard error estimation was performed using jackknife (SE = 0.504), pairs bootstrap (SE = 0.516, B = 5,000), and residual bootstrap (SE = 0.481), all of which were consistent with the classical estimate (SE = 0.489). Jackknife bias estimation revealed negligible bias (0.031, approximately 6% of SE), so bias correction was not applied. Jackknife diagnostics identified one influential observation (a student with 3 hours of study but a 95 score)—removing this observation would decrease the slope estimate by 0.11 points, though this represents only 22% of the standard error and does not qualitatively change conclusions.
Key Insights:
All SE methods agree within ~10%, providing confidence in the results.
Bias is negligible relative to SE (6%), so bias correction is unnecessary.
The outlier is the most influential observation but not influential enough to change conclusions.
Jackknife provides unique diagnostic value (influence analysis) beyond what bootstrap alone offers.
References
Foundational Works
Quenouille, M. H. (1949). Approximate tests of correlation in time series. Journal of the Royal Statistical Society, Series B, 11, 68–84. The original development of the jackknife for bias reduction in serial correlation estimation.
Quenouille, M. H. (1956). Notes on bias in estimation. Biometrika, 43, 353–360. Extension of the bias reduction technique and theoretical analysis of its properties.
Tukey, J. W. (1958). Bias and confidence in not-quite large samples (Abstract). Annals of Mathematical Statistics, 29, 614. Named the method “jackknife” and introduced pseudovalues for variance estimation.
Comprehensive Reviews and Theory
Miller, R. G. (1974). The jackknife—a review. Biometrika, 61(1), 1–15. The definitive early review covering both theory and applications of jackknife methods.
Efron, B. (1982). The Jackknife, the Bootstrap, and Other Resampling Plans. CBMS-NSF Regional Conference Series in Applied Mathematics, Vol. 38. SIAM. Unified treatment connecting jackknife to bootstrap, establishing the modern framework for resampling inference.
Shao, J., and Tu, D. (1995). The Jackknife and Bootstrap. Springer Series in Statistics. Springer. Comprehensive mathematical treatment of both methods with rigorous asymptotic theory.
Shao, J., and Wu, C. F. J. (1989). A general theory for jackknife variance estimation. Annals of Statistics, 17(3), 1176–1197. Establishes general conditions for jackknife consistency and asymptotic normality.
Delete-d Jackknife and Extensions
Wu, C. F. J. (1986). Jackknife, bootstrap and other resampling methods in regression analysis. Annals of Statistics, 14(4), 1261–1295. Develops delete-d jackknife theory for regression and compares with bootstrap methods.
Shao, J. (1989). The efficiency and consistency of approximations to the jackknife variance estimators. Journal of the American Statistical Association, 84(405), 114–119. Analysis of computational approximations for large-scale jackknife implementations.
Influence Functions and Robust Statistics
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., and Stahel, W. A. (1986). Robust Statistics: The Approach Based on Influence Functions. Wiley Series in Probability and Mathematical Statistics. Wiley. Theoretical foundation connecting jackknife to influence functions and robust estimation.
Jaeckel, L. A. (1972). The infinitesimal jackknife. Bell Laboratories Memorandum MM 72-1215-11 (unpublished technical report). The original development of the infinitesimal jackknife connecting discrete leave-one-out to continuous influence analysis. Note: This memorandum is difficult to obtain; for accessible treatments of the infinitesimal jackknife, see Efron (1982) or Efron and Tibshirani (1993), Chapter 11.
Bootstrap Comparisons
Efron, B., and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC Monographs on Statistics and Applied Probability. Chapter 11 provides accessible comparison of jackknife and bootstrap methods.
Davison, A. C., and Hinkley, D. V. (1997). Bootstrap Methods and Their Application. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. Comprehensive treatment including jackknife-after-bootstrap and the acceleration constant.
Applications
Hinkley, D. V., and Wei, B. C. (1984). Improvements of jackknife confidence limit methods. Biometrika, 71(2), 331–339. Develops improved jackknife confidence intervals addressing coverage issues.
Parr, W. C. (1985). Jackknifing differentiable statistical functionals. Journal of the Royal Statistical Society, Series B, 47(1), 56–66. Theoretical analysis of jackknife for general smooth functionals.
Textbook Treatments
Efron, B., and Hastie, T. (2016). Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. Cambridge University Press. Modern perspective on jackknife within the broader context of computational inference methods.