Chapter 5: Bayesian Inference

The Bayesian approach to inference treats probability as a measure of uncertainty—about parameters, hypotheses, and predictions. Rather than asking “What would happen if we repeated this experiment?” (the frequentist question), Bayesians ask “What should we believe given this evidence?” The answer comes through Bayes’ theorem: prior beliefs, updated by data through the likelihood, yield posterior beliefs. This simple formula—known for over 250 years—generates a complete framework for learning from data.

This philosophical shift has profound practical consequences. Parameters receive probability distributions, not just point estimates. We can make direct probability statements (“There’s a 95% probability θ lies in this interval”) rather than indirect ones (“This procedure captures θ 95% of the time in repeated sampling”). Prior information—from previous studies, expert knowledge, or physical constraints—enters the analysis formally rather than being ignored or hidden. And we can update beliefs sequentially as new data arrive, with today’s posterior becoming tomorrow’s prior.

The challenge is computational. While Bayes’ theorem is simple to state, the posterior distribution it defines is often intractable—the normalizing constant requires integrating likelihood times prior over all parameter values. For decades, Bayesian methods were limited to conjugate priors where algebra yields analytical posteriors. The breakthrough came with Markov chain Monte Carlo: rather than computing the posterior analytically, we construct a Markov chain whose stationary distribution is the posterior. This chapter develops the complete Bayesian toolkit: from prior specification through conjugate models and credible intervals, to the Markov chain theory that underlies MCMC, the Metropolis-Hastings and Gibbs algorithms that implement it, convergence diagnostics that verify it worked, model comparison methods that guide model selection, and hierarchical models that let groups borrow strength from each other.

Learning Objectives: Upon completing this chapter, you will be able to:

Bayesian Foundations

Articulate the Bayesian interpretation of probability and contrast it with frequentist reasoning
Apply Bayes’ theorem to update prior beliefs given observed data
Explain the roles of prior, likelihood, and posterior in Bayesian inference
Distinguish subjective, objective, and empirical Bayes approaches

Prior Specification

Specify conjugate priors for exponential family likelihoods
Design weakly informative priors that regularize without dominating
Encode substantive knowledge in informative priors
Assess prior sensitivity through systematic variation

Posterior Computation

Derive analytical posteriors for conjugate models (Beta-Binomial, Normal-Normal, Gamma-Poisson)
Implement grid approximation for low-dimensional posteriors
Compute posterior summaries: means, modes, variances, and quantiles

Credible Intervals

Construct equal-tailed and highest posterior density (HPD) credible intervals
Interpret credible intervals as direct probability statements about parameters
Contrast Bayesian credible intervals with frequentist confidence intervals

Markov Chain Theory

Define Markov chains through states, transition kernels, and the Markov property
Derive stationary distributions and verify detailed balance
State ergodic theorems establishing MCMC convergence
Explain mixing, burn-in, and autocorrelation in Markov chains

MCMC Algorithms

Implement the Metropolis-Hastings algorithm with symmetric and asymmetric proposals
Design proposal distributions that balance acceptance rate and mixing
Implement Gibbs sampling by cycling through full conditional distributions
Compare Metropolis-Hastings, Gibbs, and hybrid strategies

Convergence Diagnostics

Assess convergence visually using trace plots and running mean plots
Compute autocorrelation and effective sample size
Apply formal diagnostics (Gelman-Rubin R-hat, Geweke test)
Identify and remedy convergence failures

Model Comparison

Compute Bayes factors for nested and non-nested model comparison
Apply information criteria (WAIC, LOO-CV) for predictive model selection
Perform posterior predictive checks to assess model adequacy
Interpret model comparison results with appropriate uncertainty

Hierarchical Models

Specify hierarchical models for grouped or clustered data
Explain shrinkage and borrowing strength across groups
Implement MCMC for hierarchical models with multiple parameter levels
Assess when hierarchical structure improves inference