Chapter 5: Bayesian Inference
This chapter marks the second fundamental shift in the course: from asking frequentist questions about procedures to asking Bayesian questions about beliefs. Where Chapters 3–4 treated parameters as fixed unknowns and judged estimators by their repeated-sampling behavior, Bayesian inference treats parameters as uncertain quantities described by probability distributions. The central question changes from “What would happen if we repeated this experiment?” to “What should we believe given this evidence?” This shift—from the sampling distribution to the posterior distribution—transforms both the computational target and the interpretation of results.
The transition is not a rejection of what came before. Likelihood functions—the engine of maximum likelihood estimation in Chapter 3—play an equally central role in Bayesian updating. Monte Carlo simulation—developed in Chapter 2 and applied throughout Chapter 4—provides the computational backbone of Markov chain Monte Carlo. Bootstrap resampling from Chapter 4 finds its Bayesian counterpart in posterior predictive simulation. The exponential family structure from Section 3.1, which revealed why certain distributions admit elegant sufficient statistics, now reveals why those same distributions admit elegant conjugate priors. Bayesian inference does not replace the frequentist toolkit; it complements it with a framework that answers different questions using much of the same mathematical and computational machinery.
The chapter follows a deliberate arc. We begin with Bayesian foundations, establishing the philosophical framework and the three-step workflow: specify a model (prior × likelihood), condition on data to obtain the posterior, and check that the model is adequate. Prior specification develops strategies for choosing priors, from conjugate families that yield analytical posteriors to weakly informative defaults that regularize without dominating. Conjugate posteriors work through the major analytical cases—Beta-Binomial, Normal-Normal, Gamma-Poisson, Normal-Inverse-Gamma—building intuition about posterior means as weighted averages and shrinkage toward prior beliefs. Credible intervals extract actionable summaries from the posterior and contrast them carefully with the frequentist confidence intervals and bootstrap intervals from earlier chapters.
The chapter then confronts the computational challenge. For most models of practical interest, the posterior has no closed form—the normalizing constant requires an intractable integral over the entire parameter space. Markov chain theory develops the mathematical machinery—transition kernels, stationary distributions, detailed balance, ergodicity—that justifies a remarkable solution: construct a random walk whose long-run behavior is the posterior distribution. MCMC algorithms implement this idea through Metropolis-Hastings, Gibbs sampling, and Hamiltonian Monte Carlo, first from scratch and then through the PyMC probabilistic programming framework. Convergence diagnostics provide the tools to determine whether the chain has actually converged—visual inspection, effective sample size, and the Gelman-Rubin \(\hat{R}\) statistic—using the ArviZ diagnostic library.
The chapter concludes with methodology that showcases Bayesian inference at its most practical. Model comparison develops posterior predictive checks, information criteria (WAIC, LOO-CV), and Bayes factors for choosing among competing models. Hierarchical models demonstrate the power of partial pooling: when data arrive in groups, hierarchical structure lets small groups borrow strength from the population, automatically balancing individual evidence against collective regularization. The Eight Schools dataset provides the canonical example.
Learning Objectives: Upon completing this chapter, you will be able to:
Bayesian Foundations
Articulate the Bayesian interpretation of probability and contrast it with frequentist reasoning
Apply Bayes’ theorem to update prior beliefs given observed data in discrete and continuous settings
Implement grid approximation for low-dimensional posterior computation
Explain the roles of prior, likelihood, posterior, and posterior predictive distribution
Distinguish subjective, objective, and empirical Bayes approaches to prior specification
Prior Specification and Conjugate Analysis
Select appropriate priors (conjugate, weakly informative, informative) for common models
Derive conjugate posteriors for exponential family likelihoods (Beta-Binomial, Normal-Normal, Gamma-Poisson, Normal-Inverse-Gamma, Dirichlet-Multinomial)
Interpret posterior means as precision-weighted averages and quantify shrinkage
Implement prior predictive simulation and sensitivity analysis
Compare Bayesian posterior summaries with frequentist MLE-based inference
Credible Intervals
Construct equal-tailed and highest posterior density (HPD) credible intervals
Interpret credible intervals as direct probability statements about parameters
Contrast Bayesian credible intervals with frequentist confidence intervals and bootstrap intervals
Assess hypotheses via posterior probabilities and regions of practical equivalence
Markov Chain Theory
Define Markov chains through states, transition kernels, and the Markov property
Derive stationary distributions and verify detailed balance conditions
State ergodic theorems establishing MCMC convergence and connect them to the Monte Carlo LLN from Chapter 2
Explain mixing behavior, burn-in, autocorrelation, and effective sample size
MCMC Algorithms
Implement the Metropolis-Hastings algorithm with symmetric and asymmetric proposals
Implement Gibbs sampling by cycling through full conditional distributions
Design proposal distributions that balance acceptance rate and mixing efficiency
Specify Bayesian models in PyMC and interpret sampling output via ArviZ
Compare Metropolis-Hastings, Gibbs, and Hamiltonian Monte Carlo in terms of efficiency and applicability
Convergence Diagnostics
Assess convergence visually using trace plots, rank plots, and running mean plots
Compute autocorrelation, bulk and tail effective sample size, and the split \(\hat{R}\) statistic
Apply a systematic diagnostic workflow to identify and remedy convergence failures
Distinguish between convergence verification and convergence proof
Model Comparison
Perform posterior predictive checks to assess whether a model reproduces key features of observed data
Apply information criteria (WAIC, PSIS-LOO) for predictive model selection and connect to cross-validation from Chapter 4
Compute Bayes factors for nested and non-nested model comparison, recognizing their sensitivity to prior specification
Interpret model comparison results with appropriate uncertainty and practical judgment
Hierarchical Models
Specify hierarchical models for grouped or clustered data using exchangeability assumptions
Explain shrinkage and borrowing strength as automatic partial pooling between no-pooling and complete-pooling extremes
Implement hierarchical models in PyMC with appropriate parameterizations to avoid sampling pathologies
Assess when hierarchical structure improves inference and connect shrinkage to regularization in Chapter 3
Sections
- Section 5.1 Foundations of Bayesian Inference
- Historical Development: From Bayes’ Essay to the MCMC Revolution
- The Bayesian Workflow
- Bayes’ Theorem for Parameters
- Discrete Parameter Spaces
- Continuous Parameters: Grid Approximation
- The Posterior as Complete Inference
- The Likelihood Principle Revisited
- Practical Considerations
- Bringing It All Together
- Chapter 5.1 Exercises
- References
- Section 5.2 Prior Specification and Conjugate Analysis
- The Role of the Prior
- Beta-Binomial Conjugate Analysis
- Normal-Normal Model: Known Variance
- Normal-Inverse-Gamma: Unknown Mean and Variance
- Poisson-Gamma Model
- Multinomial-Dirichlet Model
- Beyond Conjugacy: The Full Landscape of Prior Specification
- Bayesian vs. Frequentist Synthesis and the Limits of Conjugacy
- Practical Considerations
- Bringing It All Together
- Exercises
- References
- Section 5.3 Posterior Inference: Credible Intervals and Hypothesis Assessment
- Section 5.4 Markov Chains: The Mathematical Foundation of MCMC
- From Grid Approximation to Markov Chains
- Markov Chains
- Stationary Distributions and Detailed Balance
- The Ergodic Theorem: Why MCMC Averages Converge
- The MCMC Estimator, Effective Sample Size, and \(\hat{R}\)
- Python: Simulating Convergence and Diagnosing Chains
- Mixing, Thinning, and Practical Considerations
- Bringing It All Together
- Exercises
- References
- Section 5.5 MCMC Algorithms: Metropolis-Hastings and Gibbs Sampling
- Section 5.6 Probabilistic Programming with PyMC
- How PyMC Works: From Model Spec to Gradient
- The Diagnostic Toolkit
- Full Worked Example 1: Logistic Regression
- Full Worked Example 2: Poisson Regression
- Scale Parameters: Half-Normal and Half-Cauchy Priors
- Derived Quantities with
pm.Deterministic - Practical Workflow Checklist
- Bringing It All Together
- Exercises
- References
- Section 5.7 Bayesian Model Comparison
- Section 5.8 Hierarchical Models and Partial Pooling
- The Pooling Problem
- The Mathematical Structure
- Worked Example 1: Eight Schools (Normal Likelihood)
- Worked Example 2: Dirichlet-Multinomial (Categorical Likelihood)
- LOO Model Comparison: Three Pooling Strategies
- When to Use Hierarchical Models
- Practical Considerations
- Bringing It All Together
- Exercises
- References
- Section 5.9 Chapter 5 Summary