Chapter 5: Bayesian Inference

This chapter marks the second fundamental shift in the course: from asking frequentist questions about procedures to asking Bayesian questions about beliefs. Where Chapters 3–4 treated parameters as fixed unknowns and judged estimators by their repeated-sampling behavior, Bayesian inference treats parameters as uncertain quantities described by probability distributions. The central question changes from “What would happen if we repeated this experiment?” to “What should we believe given this evidence?” This shift—from the sampling distribution to the posterior distribution—transforms both the computational target and the interpretation of results.

The transition is not a rejection of what came before. Likelihood functions—the engine of maximum likelihood estimation in Chapter 3—play an equally central role in Bayesian updating. Monte Carlo simulation—developed in Chapter 2 and applied throughout Chapter 4—provides the computational backbone of Markov chain Monte Carlo. Bootstrap resampling from Chapter 4 finds its Bayesian counterpart in posterior predictive simulation. The exponential family structure from Section 3.1, which revealed why certain distributions admit elegant sufficient statistics, now reveals why those same distributions admit elegant conjugate priors. Bayesian inference does not replace the frequentist toolkit; it complements it with a framework that answers different questions using much of the same mathematical and computational machinery.

The chapter follows a deliberate arc. We begin with Bayesian foundations, establishing the philosophical framework and the three-step workflow: specify a model (prior × likelihood), condition on data to obtain the posterior, and check that the model is adequate. Prior specification develops strategies for choosing priors, from conjugate families that yield analytical posteriors to weakly informative defaults that regularize without dominating. Conjugate posteriors work through the major analytical cases—Beta-Binomial, Normal-Normal, Gamma-Poisson, Normal-Inverse-Gamma—building intuition about posterior means as weighted averages and shrinkage toward prior beliefs. Credible intervals extract actionable summaries from the posterior and contrast them carefully with the frequentist confidence intervals and bootstrap intervals from earlier chapters.

The chapter then confronts the computational challenge. For most models of practical interest, the posterior has no closed form—the normalizing constant requires an intractable integral over the entire parameter space. Markov chain theory develops the mathematical machinery—transition kernels, stationary distributions, detailed balance, ergodicity—that justifies a remarkable solution: construct a random walk whose long-run behavior is the posterior distribution. MCMC algorithms implement this idea through Metropolis-Hastings, Gibbs sampling, and Hamiltonian Monte Carlo, first from scratch and then through the PyMC probabilistic programming framework. Convergence diagnostics provide the tools to determine whether the chain has actually converged—visual inspection, effective sample size, and the Gelman-Rubin \(\hat{R}\) statistic—using the ArviZ diagnostic library.

The chapter concludes with methodology that showcases Bayesian inference at its most practical. Model comparison develops posterior predictive checks, information criteria (WAIC, LOO-CV), and Bayes factors for choosing among competing models. Hierarchical models demonstrate the power of partial pooling: when data arrive in groups, hierarchical structure lets small groups borrow strength from the population, automatically balancing individual evidence against collective regularization. The Eight Schools dataset provides the canonical example.

Learning Objectives: Upon completing this chapter, you will be able to:

Bayesian Foundations

  • Articulate the Bayesian interpretation of probability and contrast it with frequentist reasoning

  • Apply Bayes’ theorem to update prior beliefs given observed data in discrete and continuous settings

  • Implement grid approximation for low-dimensional posterior computation

  • Explain the roles of prior, likelihood, posterior, and posterior predictive distribution

  • Distinguish subjective, objective, and empirical Bayes approaches to prior specification

Prior Specification and Conjugate Analysis

  • Select appropriate priors (conjugate, weakly informative, informative) for common models

  • Derive conjugate posteriors for exponential family likelihoods (Beta-Binomial, Normal-Normal, Gamma-Poisson, Normal-Inverse-Gamma, Dirichlet-Multinomial)

  • Interpret posterior means as precision-weighted averages and quantify shrinkage

  • Implement prior predictive simulation and sensitivity analysis

  • Compare Bayesian posterior summaries with frequentist MLE-based inference

Credible Intervals

  • Construct equal-tailed and highest posterior density (HPD) credible intervals

  • Interpret credible intervals as direct probability statements about parameters

  • Contrast Bayesian credible intervals with frequentist confidence intervals and bootstrap intervals

  • Assess hypotheses via posterior probabilities and regions of practical equivalence

Markov Chain Theory

  • Define Markov chains through states, transition kernels, and the Markov property

  • Derive stationary distributions and verify detailed balance conditions

  • State ergodic theorems establishing MCMC convergence and connect them to the Monte Carlo LLN from Chapter 2

  • Explain mixing behavior, burn-in, autocorrelation, and effective sample size

MCMC Algorithms

  • Implement the Metropolis-Hastings algorithm with symmetric and asymmetric proposals

  • Implement Gibbs sampling by cycling through full conditional distributions

  • Design proposal distributions that balance acceptance rate and mixing efficiency

  • Specify Bayesian models in PyMC and interpret sampling output via ArviZ

  • Compare Metropolis-Hastings, Gibbs, and Hamiltonian Monte Carlo in terms of efficiency and applicability

Convergence Diagnostics

  • Assess convergence visually using trace plots, rank plots, and running mean plots

  • Compute autocorrelation, bulk and tail effective sample size, and the split \(\hat{R}\) statistic

  • Apply a systematic diagnostic workflow to identify and remedy convergence failures

  • Distinguish between convergence verification and convergence proof

Model Comparison

  • Perform posterior predictive checks to assess whether a model reproduces key features of observed data

  • Apply information criteria (WAIC, PSIS-LOO) for predictive model selection and connect to cross-validation from Chapter 4

  • Compute Bayes factors for nested and non-nested model comparison, recognizing their sensitivity to prior specification

  • Interpret model comparison results with appropriate uncertainty and practical judgment

Hierarchical Models

  • Specify hierarchical models for grouped or clustered data using exchangeability assumptions

  • Explain shrinkage and borrowing strength as automatic partial pooling between no-pooling and complete-pooling extremes

  • Implement hierarchical models in PyMC with appropriate parameterizations to avoid sampling pathologies

  • Assess when hierarchical structure improves inference and connect shrinkage to regularization in Chapter 3

Sections