Chapter 3: Parametric Inference and Likelihood Methods
This chapter marks a fundamental shift in perspective: from generating random samples (Chapter 2) to learning from observed data. Where Monte Carlo methods asked “given a distribution, how do we simulate from it?”, parametric inference asks the inverse question: “given data, what distribution generated it?” This reversal—from known model to unknown parameters—is the central problem of statistical inference.
We adopt the frequentist framework, treating parameters as fixed but unknown quantities with randomness arising solely from the sampling process. An estimator’s quality is judged by its behavior across hypothetical repeated samples from the same population. This perspective leads to concepts like bias, variance, consistency, and efficiency—properties that characterize how estimators perform “on average” over many realizations of the sampling mechanism.
The chapter opens with exponential families, a unifying framework that encompasses most distributions encountered in practice. Understanding this structure reveals why certain distributions admit elegant sufficient statistics, why maximum likelihood estimation takes a particularly simple form, and why generalized linear models work. We then develop maximum likelihood estimation—the workhorse of parametric inference—with both analytical solutions and numerical optimization algorithms. The theory of sampling variability formalizes how estimators behave across repeated samples, leading to standard errors, confidence intervals, and hypothesis tests.
The plug-in principle provides a simple but powerful approach: estimate population quantities by substituting sample analogs. This idea, formalized through empirical distribution functions and influence functions, connects classical estimation to the bootstrap methods of Chapter 4. We then turn to linear models, developing least squares estimation, the Gauss-Markov theorem, and comprehensive diagnostics. The chapter culminates with generalized linear models, which extend regression to non-normal responses—logistic regression for binary outcomes, Poisson regression for counts, and beyond.
Throughout, we emphasize the computational aspects of inference: numerical optimization for MLEs, matrix computations for linear models, and iteratively reweighted least squares for GLMs. The theory developed here provides the foundation for Bayesian inference (Chapter 5), where parameters themselves become random variables with prior distributions updated by data.
Learning Objectives: Upon completion of this chapter, students will be able to:
Exponential Families and Likelihood
Recognize exponential family distributions and convert standard parameterizations to canonical form
Extract moments from the log-partition function using differentiation rather than integration
Apply the Neyman-Fisher factorization theorem to identify sufficient statistics
Construct conjugate priors for exponential family likelihoods
Maximum Likelihood Estimation
Derive maximum likelihood estimators analytically for common distributions
Implement numerical MLE via Newton-Raphson, Fisher scoring, and gradient-based optimization
State and apply asymptotic properties: consistency, normality, efficiency, and invariance
Construct likelihood ratio, Wald, and score tests for parametric hypotheses
Sampling Variability and Estimator Properties
Analyze estimator properties including bias, variance, mean squared error, and consistency
Apply the delta method to derive standard errors for transformed parameters
Distinguish exact sampling distributions from asymptotic approximations
Implement robust standard errors when model assumptions may be violated
Plug-in Methods
Apply the plug-in principle to estimate population functionals
State the Glivenko-Cantelli theorem and its implications for empirical distribution functions
Compute influence functions for common statistics and use them for variance estimation
Connect plug-in estimation to bootstrap methodology (developed in Chapter 4)
Linear Models
Derive ordinary least squares estimators via calculus and projection geometry
State and prove the Gauss-Markov theorem establishing OLS as BLUE
Implement comprehensive residual diagnostics and influence measures
Apply robust standard errors (HC0-HC3) when homoskedasticity fails
Use regularization methods (ridge, LASSO, elastic net) for high-dimensional settings
Generalized Linear Models
Specify GLMs through their three components: random, systematic, and link function
Implement the IRLS algorithm as Fisher scoring for exponential family responses
Diagnose model fit using deviance residuals, overdispersion tests, and goodness-of-fit measures
Handle special cases including separation in logistic regression and overdispersion in count models
Sections
- Exponential Families
- Historical Origins: From Scattered Results to Unified Theory
- The Canonical Exponential Family
- Converting Familiar Distributions
- The Log-Partition Function: A Moment-Generating Machine
- Sufficiency: Capturing All Parameter Information
- Minimal Sufficiency and Completeness
- Conjugate Priors and Bayesian Inference
- Exponential Dispersion Models and GLMs
- Python Implementation
- Practical Considerations
- Chapter 3.1 Exercises: Exponential Families Mastery
- Bringing It All Together
- References
- Maximum Likelihood Estimation
- The Likelihood Function
- The Score Function
- Fisher Information
- Closed-Form Maximum Likelihood Estimators
- Numerical Optimization for MLE
- Asymptotic Properties of MLEs
- The Cramér-Rao Lower Bound
- The Invariance Property
- Likelihood-Based Hypothesis Testing
- Confidence Intervals from Likelihood
- Practical Considerations
- Connection to Bayesian Inference
- Chapter 3.2 Exercises: Maximum Likelihood Estimation Mastery
- Bringing It All Together
- References
- Sampling Variability and Variance Estimation
- Linear Models
- Matrix Calculus Foundations
- The Linear Model
- Ordinary Least Squares: The Calculus Approach
- Ordinary Least Squares: The Geometric Approach
- Properties of the OLS Estimator
- The Gauss-Markov Theorem
- Estimating the Error Variance
- Distributional Results Under Normality
- Diagnostics and Model Checking
- Bringing It All Together
- Numerical Stability: QR Decomposition
- Model Selection and Information Criteria
- Regularization: Ridge and LASSO
- Chapter 3.4 Exercises: Linear Models Mastery
- References
- Generalized Linear Models
- Historical Context: Unification of Regression Methods
- The GLM Framework: Three Components
- Score Equations and Fisher Information
- Iteratively Reweighted Least Squares
- Logistic Regression: Binary Outcomes
- Poisson Regression: Count Data
- Gamma Regression: Positive Continuous Data
- Inference in GLMs: The Testing Triad
- Model Diagnostics
- Model Comparison and Selection
- Quasi-Likelihood and Robust Inference
- Practical Considerations
- Bringing It All Together
- Further Reading
- Chapter 3.5 Exercises: Generalized Linear Models Mastery
- References
- Chapter Summary