Part II: Frequentist Inference

What would happen if we repeated this procedure many times? The frequentist paradigm interprets probability as long-run frequency: if we could repeat an experiment infinitely many times under identical conditions, the probability of an event equals the proportion of times it occurs. This deceptively simple idea, that probability lives in the repetition, not in our beliefs, generates a powerful and coherent framework for statistical inference.

The frequentist approach answers inferential questions through hypothetical repetition. A 95% confidence interval is not 95% likely to contain the true parameter; rather, the procedure that generated it captures the truth in 95% of repeated applications. A p-value of 0.03 does not mean there is a 3% chance the null hypothesis is true; it means that if the null were true, we would see evidence this extreme only 3% of the time. These subtle but crucial distinctions pervade everything that follows.

Sampling Distributions as the Central Object

Frequentist inference treats parameters as fixed but unknown, and treats the data, and any randomness introduced by resampling, as random. The central object is the sampling distribution, the distribution of an estimator or test statistic induced by repeated sampling from a data generating process. Confidence intervals, hypothesis tests, and model diagnostics are all statements about this distribution, either derived analytically in special cases or approximated computationally.

Procedures and Operating Characteristics

Frequentist methods are judged by long-run performance. We will repeatedly ask whether a procedure achieves nominal coverage, controls the type I error rate at level \(\alpha\), attains useful power against relevant alternatives, and maintains acceptable bias and mean squared error. Computationally, these are expectations and tail probabilities under the sampling distribution, and we estimate them using analytic derivations where possible and simulation where necessary.

What makes modern frequentist inference computational is not the existence of simulation itself, but the fact that it lets us operationalize the frequentist thought experiment. Having developed Monte Carlo estimation as a general tool in Part I, we now apply it to inferential targets: approximating sampling distributions, calibrating procedures in finite samples, and diagnosing departures from idealized assumptions. When analytic sampling distributions are unavailable, we approximate them by sampling from a fitted parametric model, by resampling the empirical distribution via the bootstrap, or by exploiting the randomization implied by a null hypothesis via permutation tests. The computer becomes a laboratory for frequentist procedure design and validation.

Chapters

The Arc of Part II

Chapter 3: Parametric Inference develops likelihood-based estimation and testing in structured models. Exponential families provide the mathematical scaffolding, and maximum likelihood estimation supplies a unified approach to learning parameters from data. We develop sampling variability through exact and asymptotic approximations, connect likelihood theory to confidence intervals and hypothesis tests, and build regression methodology through linear models and generalized linear models. Throughout, we emphasize both theory and implementation: numerical optimization, stable likelihood computation, and robust inference when idealized assumptions fail.

Chapter 4: Resampling Methods completes the frequentist toolkit by constructing sampling distributions computationally when analytic forms are unavailable or unreliable. The plug-in principle motivates replacing the unknown population distribution \(F\) with the empirical distribution \(\hat{F}_n\) and propagating this estimate through the statistic of interest. We develop the nonparametric bootstrap, parametric bootstrap variants, jackknife methods, permutation tests under exchangeability, modern confidence interval constructions (including BCa), and resampling-based prediction assessment through cross-validation.

Computational Themes

Several computational motifs recur throughout Part II:

Simulation as calibration. Simulation does not replace inference, it calibrates procedures by approximating their sampling distributions and operating characteristics.

The bias-variance tradeoff. Estimators balance accuracy (low bias) against stability (low variance). We see this in shrinkage, resampling-based bias correction, and cross-validation.

Numerical stability. Likelihood calculations and variance formulas can fail computationally. We use log-likelihoods instead of likelihoods, stable one-pass algorithms for moments, and careful conditioning to avoid catastrophic cancellation.

Vectorization and efficiency. NumPy enables fast simulation and resampling through vectorized operations. This is the difference between procedures that can be stress-tested thoroughly and those that cannot.

Connections

Part I: Foundations supplies probability, distributions, and Monte Carlo primitives. Part II uses those tools to construct and validate frequentist procedures through sampling distributions and operating characteristics.

Part III: Bayesian Inference uses many of the same computational primitives but targets different distributions. Part II approximates sampling distributions; Part III approximates posterior and posterior predictive distributions. Likelihood functions unify both paradigms.

Part IV: LLMs in Data Science extends the validation mindset. Calibration, uncertainty quantification, and out-of-sample assessment connect directly to model evaluation and reliability in modern AI workflows.