Inverse CDF Method
When we need to generate random numbers from a specific probability distribution, the inverse CDF method provides an elegant solution that leverages uniform random numbers—the only type of randomness computers can truly generate. This fundamental technique underlies many sophisticated sampling algorithms and connects the abstract mathematics of probability theory with practical computation.
The method builds on a remarkable property: if U follows a uniform distribution on [0,1], then F⁻¹(U) follows the distribution with CDF F. This seemingly simple observation powers random number generation across scientific computing, from Monte Carlo simulations in physics to risk modeling in finance.
Road Map 🧭
Understand why uniform random numbers are sufficient for generating any distribution
Implement the inverse CDF algorithm for continuous and discrete distributions
Analyze computational efficiency and numerical stability issues
Apply the method to generate samples from exponential, Weibull, and custom distributions
Theoretical Foundation
The inverse CDF method, also known as the inverse transform method or quantile method, is based on a fundamental relationship between uniform random variables and arbitrary distributions.
Mathematical Framework
Consider a random variable X with cumulative distribution function (CDF) F_X. The key insight is that we can generate samples from X using only uniform random numbers.
For continuous distributions with strictly increasing CDFs, we define the inverse function:
Fundamental Theorem: If U ~ Uniform(0,1), then X = F_X^{-1}(U) has CDF F_X.
Proof: For any x ∈ ℝ:
The second equality holds because F_X is monotonically increasing, allowing us to apply it to both sides of the inequality.
Computational Perspective
The inverse CDF method transforms the problem of sampling from complex distributions into: 1. Generating uniform random numbers (solved by PRNGs) 2. Computing or approximating the inverse CDF
Algorithm: Inverse CDF Sampling
Input: CDF F_X and its inverse F_X^{-1}
Output: Sample x from distribution with CDF F_X
1. Generate u ~ Uniform(0, 1)
2. Compute x = F_X^{-1}(u)
3. Return x
Computational Complexity: O(1) for closed-form inverses, O(log n) for numerical inversion
Python Implementation
import numpy as np
from scipy import stats
def inverse_cdf_sample(inverse_cdf_func, size=1):
"""
Generate random samples using the inverse CDF method.
Parameters
----------
inverse_cdf_func : callable
Function that computes F^{-1}(u)
size : int
Number of samples to generate
Returns
-------
array
Random samples from the target distribution
"""
# Generate uniform random numbers
u = np.random.uniform(0, 1, size)
# Apply inverse CDF transformation
samples = inverse_cdf_func(u)
return samples
Discrete Distributions
For discrete distributions, the inverse CDF method requires a modified approach since the CDF is a step function.
Discrete Inverse CDF Algorithm
For a discrete random variable X with PMF p(k) and CDF F(k):
We use the generalized inverse:
def discrete_inverse_cdf(probs, size=1):
"""
Sample from discrete distribution using inverse CDF method.
Parameters
----------
probs : array-like
Probability mass function values
size : int
Number of samples
Returns
-------
array
Random samples from discrete distribution
"""
# Normalize probabilities
probs = np.array(probs) / np.sum(probs)
# Compute cumulative distribution
cdf = np.cumsum(probs)
# Generate uniform random numbers
u = np.random.uniform(0, 1, size)
# Find indices using searchsorted (binary search)
samples = np.searchsorted(cdf, u)
return samples
Example 💡: Sampling from a Custom Discrete Distribution
Consider sampling from a shorebird egg distribution where birds lay 2-5 eggs with probabilities [0.15, 0.20, 0.60, 0.05].
Given: PMF with support {2, 3, 4, 5}
Find: Generate 1000 samples
Solution:
Mathematical approach:
Python implementation:
# Define the PMF
eggs = [2, 3, 4, 5]
probs = [0.15, 0.20, 0.60, 0.05]
# Generate samples
samples = discrete_inverse_cdf(probs, size=1000)
actual_eggs = np.array(eggs)[samples]
# Verify distribution
unique, counts = np.unique(actual_eggs, return_counts=True)
empirical_probs = counts / 1000
print(f"Empirical: {empirical_probs}")
print(f"Expected: {probs}")
Result: Empirical probabilities converge to theoretical values as sample size increases.
Continuous Distributions Examples
The inverse CDF method is particularly elegant for distributions with closed-form inverse CDFs.
Exponential Distribution
For X ~ Exponential(λ):
The inverse CDF:
Since 1 - U ~ Uniform(0,1) when U ~ Uniform(0,1), we can simplify:
def exponential_inverse_cdf(lam, size=1):
"""Generate exponential random variables."""
u = np.random.uniform(0, 1, size)
return -np.log(u) / lam
# Example: Generate samples with rate λ = 2
np.random.seed(42)
samples = exponential_inverse_cdf(lam=2, size=10000)
# Verify against theoretical mean = 1/λ
print(f"Sample mean: {np.mean(samples):.4f}")
print(f"Theoretical: {1/2:.4f}")
Weibull Distribution
For X ~ Weibull(k, λ):
The inverse CDF:
def weibull_inverse_cdf(k, lam, size=1):
"""Generate Weibull random variables."""
u = np.random.uniform(0, 1, size)
return lam * (-np.log(1 - u))**(1/k)
Advanced Search Algorithms for Discrete Sampling
When dealing with large discrete distributions, the choice of search algorithm significantly impacts performance.
Binary Search Implementation
def binary_search_sample(probs, size=1):
"""
Optimized discrete sampling using binary search.
Complexity: O(log K) per sample where K = len(probs)
"""
probs = np.array(probs) / np.sum(probs)
cdf = np.cumsum(probs)
samples = []
for _ in range(size):
u = np.random.uniform()
# Binary search
left, right = 0, len(cdf) - 1
while left < right:
mid = (left + right) // 2
if u <= cdf[mid]:
right = mid
else:
left = mid + 1
samples.append(left)
return np.array(samples)
Walker-Vose Alias Method
For repeated sampling from the same distribution, the alias method achieves O(1) sampling after O(K) preprocessing:
class AliasMethod:
"""Walker-Vose Alias Method for O(1) discrete sampling."""
def __init__(self, probs):
"""
Setup alias tables.
Parameters
----------
probs : array-like
Unnormalized probabilities
"""
n = len(probs)
self.n = n
# Normalize and scale
total = sum(probs)
scaled = [p * n / total for p in probs]
# Initialize tables
self.prob = [0] * n
self.alias = [0] * n
# Separate into small and large
small = []
large = []
for i, p in enumerate(scaled):
if p < 1:
small.append(i)
else:
large.append(i)
# Build alias table
while small and large:
l = small.pop()
g = large.pop()
self.prob[l] = scaled[l]
self.alias[l] = g
scaled[g] = scaled[g] + scaled[l] - 1
if scaled[g] < 1:
small.append(g)
else:
large.append(g)
# Handle remaining
while large:
self.prob[large.pop()] = 1
while small:
self.prob[small.pop()] = 1
def sample(self, size=1):
"""Generate samples in O(1) time each."""
samples = []
for _ in range(size):
i = np.random.randint(0, self.n)
if np.random.uniform() < self.prob[i]:
samples.append(i)
else:
samples.append(self.alias[i])
return np.array(samples)
Practical Considerations
When implementing the inverse CDF method in production systems, several practical issues arise.
Numerical Stability
For distributions with extreme parameters, direct computation can lead to numerical issues:
Common Pitfall ⚠️
Computing F⁻¹(u) when u is very close to 0 or 1 can cause underflow/overflow. For example, in the exponential distribution, -log(u) → ∞ as u → 0.
Solution: Use specialized libraries that handle edge cases, or implement guards:
u = np.clip(u, np.finfo(float).eps, 1 - np.finfo(float).eps)
When Inverse CDF is Not Available
Many distributions lack closed-form inverse CDFs (e.g., Normal distribution). Options include:
Numerical root-finding: Solve F(x) = u iteratively
Approximations: Use polynomial or rational approximations
Alternative methods: Consider rejection sampling or transformation methods
Performance Optimization
# Vectorized implementation for better performance
def vectorized_exponential(lam, size):
"""Fully vectorized exponential sampling."""
return -np.log(np.random.uniform(0, 1, size)) / lam
# Performance comparison
import time
size = 1_000_000
lam = 2.0
# Vectorized approach
start = time.time()
samples_vec = vectorized_exponential(lam, size)
vec_time = time.time() - start
print(f"Vectorized: {vec_time:.4f} seconds")
print(f"Samples/second: {size/vec_time:,.0f}")
Bringing It All Together
The inverse CDF method elegantly solves the fundamental problem of random number generation by transforming uniform randomness into any desired distribution. Its theoretical simplicity—just one function evaluation per sample—makes it the gold standard when the inverse CDF is available.
Key Takeaways 📝
Core concept: Transform U ~ Uniform(0,1) using X = F⁻¹(U) to get X ~ F
Computational insight: Trade complex sampling for CDF inversion
Practical application: Use for distributions with closed-form inverses; consider alternatives otherwise
Connection: Foundation for understanding more complex methods like rejection sampling
Exercises
Conceptual Understanding: Prove that if X has CDF F and Y = F(X), then Y ~ Uniform(0,1).
Implementation: Implement inverse CDF sampling for the Pareto distribution with CDF:
\[F(x) = 1 - \left(\frac{x_m}{x}\right)^\alpha, \quad x \geq x_m\]def pareto_inverse_cdf(alpha, x_m, size=1): # Your implementation here pass
Derive the inverse CDF formula
Implement the sampling function
Verify your implementation by comparing sample moments with theoretical values
Analysis: Compare the performance of linear search, binary search, and alias method for sampling from a Zipf distribution with K = 1000 categories. Plot execution time vs number of samples.
Extension: The Box-Muller transform generates normal random variables from uniforms. Research this method and explain why it’s preferred over inverse CDF for the normal distribution. Implement both approaches and compare their speed and accuracy.