Rejection Sampling
When the inverse CDF method fails—either because the CDF lacks a closed form or its inverse is computationally intractable—rejection sampling offers a powerful alternative. This method, also known as the accept-reject algorithm, can generate samples from virtually any distribution by strategically accepting or rejecting candidate samples from a simpler proposal distribution.
The elegance of rejection sampling lies in its geometric interpretation: we uniformly sample points under a scaled envelope function and keep only those that fall under the target density. This approach trades computational efficiency for universality, making it indispensable in modern computational statistics, particularly in Bayesian inference and statistical physics.
Road Map 🧭
Master the fundamental theorem of simulation and its geometric interpretation
Implement the accept-reject algorithm for univariate and multivariate distributions
Optimize the choice of proposal distribution and scaling constant
Analyze acceptance rates and computational efficiency trade-offs
Fundamental Theorem of Simulation
The theoretical foundation of rejection sampling rests on a beautiful relationship between probability densities and uniform distributions over regions.
Mathematical Foundation
For any probability density function f_X on an arbitrary space, we can express it as:
This seemingly trivial identity has profound implications: f_X appears as the marginal density of the joint distribution:
Key Insight: If we can sample uniformly from the region under the density curve, the x-coordinates of these points follow the target distribution.
Computational Perspective
The challenge is sampling uniformly from the region {(x, u) : 0 < u < f_X(x)}. The accept-reject method accomplishes this by:
Sampling from a larger, simpler region that encloses our target
Rejecting points that fall outside the target region
Algorithm: Basic Rejection Sampling
Input: Target density f(x), proposal density g(x), bound M where f(x) ≤ M·g(x)
Output: Sample from f(x)
Repeat:
1. Generate Y ~ g(x) [candidate sample]
2. Generate U ~ Uniform(0, 1)
3. If U ≤ f(Y)/(M·g(Y)):
Accept: return X = Y
4. Else:
Reject: go to step 1
Acceptance Probability: P(accept) = 1/M
Python Implementation
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
def rejection_sampling(target_pdf, proposal_sampler, proposal_pdf, M, size=1):
"""
Generic rejection sampling implementation.
Parameters
----------
target_pdf : callable
Target probability density function f(x)
proposal_sampler : callable
Function that generates samples from proposal g(x)
proposal_pdf : callable
Proposal probability density function g(x)
M : float
Bound constant such that f(x) ≤ M·g(x) for all x
size : int
Number of samples to generate
Returns
-------
samples : array
Accepted samples from target distribution
acceptance_rate : float
Empirical acceptance rate
"""
samples = []
n_proposed = 0
while len(samples) < size:
# Step 1: Generate candidate
y = proposal_sampler()
n_proposed += 1
# Step 2: Generate uniform for acceptance test
u = np.random.uniform(0, 1)
# Step 3: Accept/reject test
acceptance_prob = target_pdf(y) / (M * proposal_pdf(y))
if u <= acceptance_prob:
samples.append(y)
acceptance_rate = size / n_proposed
return np.array(samples), acceptance_rate
Classic Example: Sampling from Beta Distribution
Let’s implement rejection sampling for Beta(α, β) when α > 1 and β > 1, where the density has a unique mode.
Beta Distribution via Rejection
The Beta(α, β) distribution has PDF:
where B(α, β) is the beta function. We’ll use Uniform(0, 1) as our proposal distribution.
Example 💡: Beta(2.5, 2.5) Sampling
Given: Target is Beta(2.5, 2.5), symmetric around 0.5
Find: Generate samples using uniform proposal
Solution:
First, find the maximum of the target density:
Python implementation:
from scipy.special import beta as beta_function
def beta_rejection_sampling(alpha, beta_param, size=1000):
"""Sample from Beta distribution using rejection sampling."""
# Target PDF
def target_pdf(x):
if 0 < x < 1:
return (x**(alpha-1) * (1-x)**(beta_param-1) /
beta_function(alpha, beta_param))
return 0
# Proposal: Uniform(0,1)
proposal_sampler = lambda: np.random.uniform(0, 1)
proposal_pdf = lambda x: 1.0 if 0 < x < 1 else 0
# Find bound M (use mode for symmetric case)
mode = (alpha - 1) / (alpha + beta_param - 2)
M = target_pdf(mode) / proposal_pdf(mode)
# Generate samples
samples, rate = rejection_sampling(
target_pdf, proposal_sampler, proposal_pdf, M, size
)
return samples, rate, M
# Generate and visualize
np.random.seed(42)
samples, acceptance_rate, M = beta_rejection_sampling(2.5, 2.5, 5000)
print(f"Bound M: {M:.3f}")
print(f"Acceptance rate: {acceptance_rate:.3f}")
print(f"Theoretical rate: {1/M:.3f}")
Result: Acceptance rate ≈ 53.3%, meaning we reject about 47% of proposals.
Visualizing the Accept-Reject Process
def visualize_rejection_sampling(alpha, beta_param, n_points=1000):
"""Visualize the rejection sampling process."""
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# Left plot: Show envelope and target
x = np.linspace(0.001, 0.999, 1000)
mode = (alpha - 1) / (alpha + beta_param - 2)
beta_pdf = stats.beta(alpha, beta_param).pdf
M = beta_pdf(mode)
axes[0].plot(x, beta_pdf(x), 'b-', label='Target: Beta(α,β)', lw=2)
axes[0].fill_between(x, 0, beta_pdf(x), alpha=0.3)
axes[0].axhline(y=M, color='r', linestyle='--', label=f'Envelope: M={M:.2f}')
axes[0].fill_between(x, 0, M, alpha=0.1, color='red')
axes[0].set_xlabel('x')
axes[0].set_ylabel('Density')
axes[0].set_title('Target Density and Envelope')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# Right plot: Show accepted/rejected points
accepted_x, accepted_y = [], []
rejected_x, rejected_y = [], []
for _ in range(n_points):
x_prop = np.random.uniform(0, 1)
y_prop = np.random.uniform(0, M)
if y_prop <= beta_pdf(x_prop):
accepted_x.append(x_prop)
accepted_y.append(y_prop)
else:
rejected_x.append(x_prop)
rejected_y.append(y_prop)
axes[1].scatter(rejected_x, rejected_y, c='red', alpha=0.3, s=1, label='Rejected')
axes[1].scatter(accepted_x, accepted_y, c='blue', alpha=0.5, s=1, label='Accepted')
axes[1].plot(np.linspace(0, 1, 1000),
beta_pdf(np.linspace(0, 1, 1000)),
'black', lw=2)
axes[1].set_xlabel('x')
axes[1].set_ylabel('u')
axes[1].set_title('Accept-Reject Visualization')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Visualize Beta(4, 2) - asymmetric case
visualize_rejection_sampling(4, 2)
Optimizing Proposal Distributions
The efficiency of rejection sampling critically depends on the choice of proposal distribution and the tightness of the bound M.
Choosing Good Proposals
The ideal proposal distribution g(x) should: 1. Be easy to sample from 2. Have a similar shape to the target f(x) 3. Have heavier tails than f(x) to ensure coverage
Efficiency Metric: The acceptance rate 1/M directly measures efficiency. Smaller M means higher acceptance.
Adaptive Rejection Sampling
For log-concave densities, we can construct piecewise linear envelopes that adapt during sampling:
class AdaptiveRejectionSampler:
"""
Adaptive Rejection Sampling for log-concave densities.
"""
def __init__(self, log_pdf, initial_points):
"""
Initialize with log-pdf and starting abscissae.
Parameters
----------
log_pdf : callable
Natural log of target density
initial_points : array
Initial support points for envelope
"""
self.log_pdf = log_pdf
self.points = np.array(initial_points)
self.log_values = log_pdf(self.points)
self.build_envelope()
def build_envelope(self):
"""Construct piecewise linear upper envelope."""
# Compute slopes between adjacent points
self.slopes = np.diff(self.log_values) / np.diff(self.points)
# Build piecewise exponential envelope
# (simplified implementation)
pass
def sample(self, size=1):
"""Generate samples with adaptive envelope."""
samples = []
while len(samples) < size:
# Sample from current envelope
y = self.sample_from_envelope()
# Accept/reject test
u = np.random.uniform()
if u <= np.exp(self.log_pdf(y) - self.envelope_log_pdf(y)):
samples.append(y)
# Adapt: add point to envelope
self.add_point(y)
self.build_envelope()
return np.array(samples)
Truncated Distributions
For sampling from truncated distributions, rejection sampling is often the method of choice:
def truncated_normal(mu, sigma, a, b, size=1):
"""
Sample from truncated normal distribution on [a, b].
Uses rejection sampling with uniform proposal.
"""
from scipy.stats import norm
# Find maximum of truncated normal PDF
mode = np.clip(mu, a, b) # Mode is at mu if in [a, b]
max_pdf = norm.pdf(mode, mu, sigma)
# Normalizing constant for truncated distribution
Z = norm.cdf(b, mu, sigma) - norm.cdf(a, mu, sigma)
samples = []
n_attempts = 0
while len(samples) < size:
# Propose from Uniform(a, b)
x = np.random.uniform(a, b)
n_attempts += 1
# Accept with probability proportional to PDF
u = np.random.uniform(0, max_pdf / Z)
if u <= norm.pdf(x, mu, sigma) / Z:
samples.append(x)
acceptance_rate = size / n_attempts
return np.array(samples), acceptance_rate
Multivariate Rejection Sampling
Rejection sampling extends naturally to multiple dimensions, though efficiency decreases with dimensionality.
Sampling from Arbitrary 2D Densities
def rejection_sampling_2d(target_pdf, x_bounds, y_bounds, M, size=1000):
"""
Rejection sampling for 2D distributions.
Parameters
----------
target_pdf : callable
Target PDF f(x, y)
x_bounds, y_bounds : tuple
Bounds (min, max) for each dimension
M : float
Bound on target PDF
"""
samples = []
while len(samples) < size:
# Propose uniformly in rectangle
x = np.random.uniform(*x_bounds)
y = np.random.uniform(*y_bounds)
# Height for acceptance
u = np.random.uniform(0, M)
# Accept if under surface
if u <= target_pdf(x, y):
samples.append([x, y])
return np.array(samples)
# Example: Sample from 2D mixture of Gaussians
def mixture_2d_pdf(x, y):
"""Bimodal 2D distribution."""
from scipy.stats import multivariate_normal
# Two components
mv1 = multivariate_normal([-1, -1], [[0.5, 0.2], [0.2, 0.5]])
mv2 = multivariate_normal([1, 1], [[0.5, -0.2], [-0.2, 0.5]])
return 0.5 * mv1.pdf([x, y]) + 0.5 * mv2.pdf([x, y])
# Generate samples
samples_2d = rejection_sampling_2d(
mixture_2d_pdf,
x_bounds=(-4, 4),
y_bounds=(-4, 4),
M=0.2, # Upper bound on density
size=2000
)
Practical Considerations
Implementing rejection sampling in production requires careful attention to numerical stability and efficiency.
Numerical Stability Issues
Common Pitfall ⚠️
Computing acceptance probabilities f(x)/(M·g(x)) can cause numerical overflow or underflow, especially with extreme parameter values.
Solution: Work in log space when possible:
log_accept_prob = log_f(x) - (log(M) + log_g(x))
if np.log(u) <= log_accept_prob:
accept
Curse of Dimensionality
Rejection sampling efficiency decreases exponentially with dimension. For a d-dimensional hypercube with acceptance region volume ε:
Example: Sampling from a d-dimensional sphere using a cube envelope:
def sphere_volume_ratio(d):
"""Volume ratio of d-sphere to d-cube."""
from scipy.special import gamma
# Volume of unit d-sphere / Volume of [-1,1]^d cube
sphere_vol = np.pi**(d/2) / gamma(d/2 + 1)
cube_vol = 2**d
return sphere_vol / cube_vol
# Show exponential decay
dimensions = range(1, 21)
ratios = [sphere_volume_ratio(d) for d in dimensions]
plt.semilogy(dimensions, ratios, 'o-')
plt.xlabel('Dimension')
plt.ylabel('Acceptance Rate')
plt.title('Curse of Dimensionality in Rejection Sampling')
plt.grid(True)
Monitoring and Diagnostics
class RejectionSamplerWithDiagnostics:
"""Rejection sampler with performance monitoring."""
def __init__(self, target_pdf, proposal_sampler, proposal_pdf, M):
self.target_pdf = target_pdf
self.proposal_sampler = proposal_sampler
self.proposal_pdf = proposal_pdf
self.M = M
# Diagnostics
self.n_proposed = 0
self.n_accepted = 0
self.acceptance_history = []
def sample(self, size=1, batch_size=100):
"""Sample with batch monitoring."""
samples = []
while len(samples) < size:
batch_accepted = 0
for _ in range(batch_size):
y = self.proposal_sampler()
self.n_proposed += 1
u = np.random.uniform()
accept_prob = self.target_pdf(y) / (self.M * self.proposal_pdf(y))
if u <= accept_prob:
samples.append(y)
self.n_accepted += 1
batch_accepted += 1
if len(samples) >= size:
break
# Record batch acceptance rate
self.acceptance_history.append(batch_accepted / batch_size)
return np.array(samples[:size])
def get_diagnostics(self):
"""Return sampling diagnostics."""
return {
'overall_acceptance_rate': self.n_accepted / self.n_proposed,
'theoretical_rate': 1 / self.M,
'efficiency': (self.n_accepted / self.n_proposed) * self.M,
'acceptance_history': self.acceptance_history
}
Bringing It All Together
Rejection sampling transforms the problem of sampling from complex distributions into a geometric task of uniformly sampling regions. While less efficient than the inverse CDF method when available, its universality makes it essential for modern computational statistics.
Key Takeaways 📝
Core concept: Sample uniformly under the density curve by accepting/rejecting points under an envelope
Computational insight: Trade efficiency (acceptance rate = 1/M) for universality
Practical application: Essential for truncated distributions and when inverse CDF is unavailable
Connection: Foundation for more sophisticated methods like importance sampling and MCMC
Exercises
Conceptual Understanding: Prove that the accept-reject algorithm produces samples from the target distribution. Start with the conditional probability P(Y ∈ dy | Y accepted).
Implementation: Implement rejection sampling for the truncated exponential distribution on [a, b]:
def truncated_exponential(lam, a, b, size=1000): """ Sample from exponential distribution truncated to [a, b]. Compare two proposal distributions: 1. Uniform(a, b) 2. Exponential(lam) with rejection of values outside [a, b] """ # Your implementation here pass
Compare the efficiency of both approaches for different values of a and b.
Analysis: The von Mises distribution (circular normal) has PDF:
\[f(\theta) = \frac{e^{\kappa \cos(\theta - \mu)}}{2\pi I_0(\kappa)}, \quad \theta \in [0, 2\pi]\]where I_0 is the modified Bessel function. Implement rejection sampling using: a) Uniform proposal on [0, 2π] b) Wrapped Cauchy proposal
Plot acceptance rates as a function of concentration parameter κ.
Extension: Implement the “squeeze” technique that adds a lower bound to avoid evaluating the target PDF when possible:
\[s(x) \leq f(x) \leq M \cdot g(x)\]Apply this to Beta(5, 2) sampling and measure the reduction in target PDF evaluations.