Rejection Sampling

When the inverse CDF method fails—either because the CDF lacks a closed form or its inverse is computationally intractable—rejection sampling offers a powerful alternative. This method, also known as the accept-reject algorithm, can generate samples from virtually any distribution by strategically accepting or rejecting candidate samples from a simpler proposal distribution.

The elegance of rejection sampling lies in its geometric interpretation: we uniformly sample points under a scaled envelope function and keep only those that fall under the target density. This approach trades computational efficiency for universality, making it indispensable in modern computational statistics, particularly in Bayesian inference and statistical physics.

Road Map 🧭

  • Master the fundamental theorem of simulation and its geometric interpretation

  • Implement the accept-reject algorithm for univariate and multivariate distributions

  • Optimize the choice of proposal distribution and scaling constant

  • Analyze acceptance rates and computational efficiency trade-offs

Fundamental Theorem of Simulation

The theoretical foundation of rejection sampling rests on a beautiful relationship between probability densities and uniform distributions over regions.

Mathematical Foundation

For any probability density function f_X on an arbitrary space, we can express it as:

\[f_X(x) = \int_0^{f_X(x)} du\]

This seemingly trivial identity has profound implications: f_X appears as the marginal density of the joint distribution:

\[(X, U) \sim \text{Uniform}\{(x, u) : 0 < u < f_X(x)\}\]

Key Insight: If we can sample uniformly from the region under the density curve, the x-coordinates of these points follow the target distribution.

Computational Perspective

The challenge is sampling uniformly from the region {(x, u) : 0 < u < f_X(x)}. The accept-reject method accomplishes this by:

  1. Sampling from a larger, simpler region that encloses our target

  2. Rejecting points that fall outside the target region

Algorithm: Basic Rejection Sampling

Input: Target density f(x), proposal density g(x), bound M where f(x) ≤ M·g(x)
Output: Sample from f(x)

Repeat:
   1. Generate Y ~ g(x)  [candidate sample]
   2. Generate U ~ Uniform(0, 1)
   3. If U ≤ f(Y)/(M·g(Y)):
      Accept: return X = Y
   4. Else:
      Reject: go to step 1

Acceptance Probability: P(accept) = 1/M

Python Implementation

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

def rejection_sampling(target_pdf, proposal_sampler, proposal_pdf, M, size=1):
    """
    Generic rejection sampling implementation.

    Parameters
    ----------
    target_pdf : callable
        Target probability density function f(x)
    proposal_sampler : callable
        Function that generates samples from proposal g(x)
    proposal_pdf : callable
        Proposal probability density function g(x)
    M : float
        Bound constant such that f(x) ≤ M·g(x) for all x
    size : int
        Number of samples to generate

    Returns
    -------
    samples : array
        Accepted samples from target distribution
    acceptance_rate : float
        Empirical acceptance rate
    """
    samples = []
    n_proposed = 0

    while len(samples) < size:
        # Step 1: Generate candidate
        y = proposal_sampler()
        n_proposed += 1

        # Step 2: Generate uniform for acceptance test
        u = np.random.uniform(0, 1)

        # Step 3: Accept/reject test
        acceptance_prob = target_pdf(y) / (M * proposal_pdf(y))

        if u <= acceptance_prob:
            samples.append(y)

    acceptance_rate = size / n_proposed
    return np.array(samples), acceptance_rate

Classic Example: Sampling from Beta Distribution

Let’s implement rejection sampling for Beta(α, β) when α > 1 and β > 1, where the density has a unique mode.

Beta Distribution via Rejection

The Beta(α, β) distribution has PDF:

\[f_X(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}, \quad x \in (0, 1)\]

where B(α, β) is the beta function. We’ll use Uniform(0, 1) as our proposal distribution.

Example 💡: Beta(2.5, 2.5) Sampling

Given: Target is Beta(2.5, 2.5), symmetric around 0.5

Find: Generate samples using uniform proposal

Solution:

First, find the maximum of the target density:

\[ \begin{align}\begin{aligned}\text{Mode} = \frac{\alpha - 1}{\alpha + \beta - 2} = 0.5\\M = f_X(0.5) = \frac{0.5^{1.5} \cdot 0.5^{1.5}}{B(2.5, 2.5)} \approx 1.875\end{aligned}\end{align} \]

Python implementation:

from scipy.special import beta as beta_function

def beta_rejection_sampling(alpha, beta_param, size=1000):
    """Sample from Beta distribution using rejection sampling."""

    # Target PDF
    def target_pdf(x):
        if 0 < x < 1:
            return (x**(alpha-1) * (1-x)**(beta_param-1) /
                    beta_function(alpha, beta_param))
        return 0

    # Proposal: Uniform(0,1)
    proposal_sampler = lambda: np.random.uniform(0, 1)
    proposal_pdf = lambda x: 1.0 if 0 < x < 1 else 0

    # Find bound M (use mode for symmetric case)
    mode = (alpha - 1) / (alpha + beta_param - 2)
    M = target_pdf(mode) / proposal_pdf(mode)

    # Generate samples
    samples, rate = rejection_sampling(
        target_pdf, proposal_sampler, proposal_pdf, M, size
    )

    return samples, rate, M

# Generate and visualize
np.random.seed(42)
samples, acceptance_rate, M = beta_rejection_sampling(2.5, 2.5, 5000)

print(f"Bound M: {M:.3f}")
print(f"Acceptance rate: {acceptance_rate:.3f}")
print(f"Theoretical rate: {1/M:.3f}")

Result: Acceptance rate ≈ 53.3%, meaning we reject about 47% of proposals.

Visualizing the Accept-Reject Process

def visualize_rejection_sampling(alpha, beta_param, n_points=1000):
    """Visualize the rejection sampling process."""

    fig, axes = plt.subplots(1, 2, figsize=(12, 5))

    # Left plot: Show envelope and target
    x = np.linspace(0.001, 0.999, 1000)
    mode = (alpha - 1) / (alpha + beta_param - 2)
    beta_pdf = stats.beta(alpha, beta_param).pdf
    M = beta_pdf(mode)

    axes[0].plot(x, beta_pdf(x), 'b-', label='Target: Beta(α,β)', lw=2)
    axes[0].fill_between(x, 0, beta_pdf(x), alpha=0.3)
    axes[0].axhline(y=M, color='r', linestyle='--', label=f'Envelope: M={M:.2f}')
    axes[0].fill_between(x, 0, M, alpha=0.1, color='red')
    axes[0].set_xlabel('x')
    axes[0].set_ylabel('Density')
    axes[0].set_title('Target Density and Envelope')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)

    # Right plot: Show accepted/rejected points
    accepted_x, accepted_y = [], []
    rejected_x, rejected_y = [], []

    for _ in range(n_points):
        x_prop = np.random.uniform(0, 1)
        y_prop = np.random.uniform(0, M)

        if y_prop <= beta_pdf(x_prop):
            accepted_x.append(x_prop)
            accepted_y.append(y_prop)
        else:
            rejected_x.append(x_prop)
            rejected_y.append(y_prop)

    axes[1].scatter(rejected_x, rejected_y, c='red', alpha=0.3, s=1, label='Rejected')
    axes[1].scatter(accepted_x, accepted_y, c='blue', alpha=0.5, s=1, label='Accepted')
    axes[1].plot(np.linspace(0, 1, 1000),
                 beta_pdf(np.linspace(0, 1, 1000)),
                 'black', lw=2)
    axes[1].set_xlabel('x')
    axes[1].set_ylabel('u')
    axes[1].set_title('Accept-Reject Visualization')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

# Visualize Beta(4, 2) - asymmetric case
visualize_rejection_sampling(4, 2)

Optimizing Proposal Distributions

The efficiency of rejection sampling critically depends on the choice of proposal distribution and the tightness of the bound M.

Choosing Good Proposals

The ideal proposal distribution g(x) should: 1. Be easy to sample from 2. Have a similar shape to the target f(x) 3. Have heavier tails than f(x) to ensure coverage

Efficiency Metric: The acceptance rate 1/M directly measures efficiency. Smaller M means higher acceptance.

Adaptive Rejection Sampling

For log-concave densities, we can construct piecewise linear envelopes that adapt during sampling:

class AdaptiveRejectionSampler:
    """
    Adaptive Rejection Sampling for log-concave densities.
    """

    def __init__(self, log_pdf, initial_points):
        """
        Initialize with log-pdf and starting abscissae.

        Parameters
        ----------
        log_pdf : callable
            Natural log of target density
        initial_points : array
            Initial support points for envelope
        """
        self.log_pdf = log_pdf
        self.points = np.array(initial_points)
        self.log_values = log_pdf(self.points)
        self.build_envelope()

    def build_envelope(self):
        """Construct piecewise linear upper envelope."""
        # Compute slopes between adjacent points
        self.slopes = np.diff(self.log_values) / np.diff(self.points)

        # Build piecewise exponential envelope
        # (simplified implementation)
        pass

    def sample(self, size=1):
        """Generate samples with adaptive envelope."""
        samples = []

        while len(samples) < size:
            # Sample from current envelope
            y = self.sample_from_envelope()

            # Accept/reject test
            u = np.random.uniform()
            if u <= np.exp(self.log_pdf(y) - self.envelope_log_pdf(y)):
                samples.append(y)

                # Adapt: add point to envelope
                self.add_point(y)
                self.build_envelope()

        return np.array(samples)

Truncated Distributions

For sampling from truncated distributions, rejection sampling is often the method of choice:

def truncated_normal(mu, sigma, a, b, size=1):
    """
    Sample from truncated normal distribution on [a, b].

    Uses rejection sampling with uniform proposal.
    """
    from scipy.stats import norm

    # Find maximum of truncated normal PDF
    mode = np.clip(mu, a, b)  # Mode is at mu if in [a, b]
    max_pdf = norm.pdf(mode, mu, sigma)

    # Normalizing constant for truncated distribution
    Z = norm.cdf(b, mu, sigma) - norm.cdf(a, mu, sigma)

    samples = []
    n_attempts = 0

    while len(samples) < size:
        # Propose from Uniform(a, b)
        x = np.random.uniform(a, b)
        n_attempts += 1

        # Accept with probability proportional to PDF
        u = np.random.uniform(0, max_pdf / Z)

        if u <= norm.pdf(x, mu, sigma) / Z:
            samples.append(x)

    acceptance_rate = size / n_attempts
    return np.array(samples), acceptance_rate

Multivariate Rejection Sampling

Rejection sampling extends naturally to multiple dimensions, though efficiency decreases with dimensionality.

Sampling from Arbitrary 2D Densities

def rejection_sampling_2d(target_pdf, x_bounds, y_bounds, M, size=1000):
    """
    Rejection sampling for 2D distributions.

    Parameters
    ----------
    target_pdf : callable
        Target PDF f(x, y)
    x_bounds, y_bounds : tuple
        Bounds (min, max) for each dimension
    M : float
        Bound on target PDF
    """
    samples = []

    while len(samples) < size:
        # Propose uniformly in rectangle
        x = np.random.uniform(*x_bounds)
        y = np.random.uniform(*y_bounds)

        # Height for acceptance
        u = np.random.uniform(0, M)

        # Accept if under surface
        if u <= target_pdf(x, y):
            samples.append([x, y])

    return np.array(samples)

# Example: Sample from 2D mixture of Gaussians
def mixture_2d_pdf(x, y):
    """Bimodal 2D distribution."""
    from scipy.stats import multivariate_normal

    # Two components
    mv1 = multivariate_normal([-1, -1], [[0.5, 0.2], [0.2, 0.5]])
    mv2 = multivariate_normal([1, 1], [[0.5, -0.2], [-0.2, 0.5]])

    return 0.5 * mv1.pdf([x, y]) + 0.5 * mv2.pdf([x, y])

# Generate samples
samples_2d = rejection_sampling_2d(
    mixture_2d_pdf,
    x_bounds=(-4, 4),
    y_bounds=(-4, 4),
    M=0.2,  # Upper bound on density
    size=2000
)

Practical Considerations

Implementing rejection sampling in production requires careful attention to numerical stability and efficiency.

Numerical Stability Issues

Common Pitfall ⚠️

Computing acceptance probabilities f(x)/(M·g(x)) can cause numerical overflow or underflow, especially with extreme parameter values.

Solution: Work in log space when possible:

log_accept_prob = log_f(x) - (log(M) + log_g(x))
if np.log(u) <= log_accept_prob:
    accept

Curse of Dimensionality

Rejection sampling efficiency decreases exponentially with dimension. For a d-dimensional hypercube with acceptance region volume ε:

\[\text{Acceptance Rate} \approx \epsilon^d\]

Example: Sampling from a d-dimensional sphere using a cube envelope:

def sphere_volume_ratio(d):
    """Volume ratio of d-sphere to d-cube."""
    from scipy.special import gamma
    # Volume of unit d-sphere / Volume of [-1,1]^d cube
    sphere_vol = np.pi**(d/2) / gamma(d/2 + 1)
    cube_vol = 2**d
    return sphere_vol / cube_vol

# Show exponential decay
dimensions = range(1, 21)
ratios = [sphere_volume_ratio(d) for d in dimensions]

plt.semilogy(dimensions, ratios, 'o-')
plt.xlabel('Dimension')
plt.ylabel('Acceptance Rate')
plt.title('Curse of Dimensionality in Rejection Sampling')
plt.grid(True)

Monitoring and Diagnostics

class RejectionSamplerWithDiagnostics:
    """Rejection sampler with performance monitoring."""

    def __init__(self, target_pdf, proposal_sampler, proposal_pdf, M):
        self.target_pdf = target_pdf
        self.proposal_sampler = proposal_sampler
        self.proposal_pdf = proposal_pdf
        self.M = M

        # Diagnostics
        self.n_proposed = 0
        self.n_accepted = 0
        self.acceptance_history = []

    def sample(self, size=1, batch_size=100):
        """Sample with batch monitoring."""
        samples = []

        while len(samples) < size:
            batch_accepted = 0

            for _ in range(batch_size):
                y = self.proposal_sampler()
                self.n_proposed += 1

                u = np.random.uniform()
                accept_prob = self.target_pdf(y) / (self.M * self.proposal_pdf(y))

                if u <= accept_prob:
                    samples.append(y)
                    self.n_accepted += 1
                    batch_accepted += 1

                    if len(samples) >= size:
                        break

            # Record batch acceptance rate
            self.acceptance_history.append(batch_accepted / batch_size)

        return np.array(samples[:size])

    def get_diagnostics(self):
        """Return sampling diagnostics."""
        return {
            'overall_acceptance_rate': self.n_accepted / self.n_proposed,
            'theoretical_rate': 1 / self.M,
            'efficiency': (self.n_accepted / self.n_proposed) * self.M,
            'acceptance_history': self.acceptance_history
        }

Bringing It All Together

Rejection sampling transforms the problem of sampling from complex distributions into a geometric task of uniformly sampling regions. While less efficient than the inverse CDF method when available, its universality makes it essential for modern computational statistics.

Key Takeaways 📝

  1. Core concept: Sample uniformly under the density curve by accepting/rejecting points under an envelope

  2. Computational insight: Trade efficiency (acceptance rate = 1/M) for universality

  3. Practical application: Essential for truncated distributions and when inverse CDF is unavailable

  4. Connection: Foundation for more sophisticated methods like importance sampling and MCMC

Exercises

  1. Conceptual Understanding: Prove that the accept-reject algorithm produces samples from the target distribution. Start with the conditional probability P(Y ∈ dy | Y accepted).

  2. Implementation: Implement rejection sampling for the truncated exponential distribution on [a, b]:

    def truncated_exponential(lam, a, b, size=1000):
        """
        Sample from exponential distribution truncated to [a, b].
        Compare two proposal distributions:
        1. Uniform(a, b)
        2. Exponential(lam) with rejection of values outside [a, b]
        """
        # Your implementation here
        pass
    

    Compare the efficiency of both approaches for different values of a and b.

  3. Analysis: The von Mises distribution (circular normal) has PDF:

    \[f(\theta) = \frac{e^{\kappa \cos(\theta - \mu)}}{2\pi I_0(\kappa)}, \quad \theta \in [0, 2\pi]\]

    where I_0 is the modified Bessel function. Implement rejection sampling using: a) Uniform proposal on [0, 2π] b) Wrapped Cauchy proposal

    Plot acceptance rates as a function of concentration parameter κ.

  4. Extension: Implement the “squeeze” technique that adds a lower bound to avoid evaluating the target PDF when possible:

    \[s(x) \leq f(x) \leq M \cdot g(x)\]

    Apply this to Beta(5, 2) sampling and measure the reduction in target PDF evaluations.