The Bernoulli distribution is the atom of probability. From this simplest random experiment—a single trial with probability $p$ of success—we can derive every major probability distribution through combinations, limits, and transformations.
This interactive guide shows how discrete distributions arise from sums and sequences of Bernoulli trials, how continuous distributions emerge as limiting cases, and the mathematical machinery (moment generating functions, limits, and transformations) that connects them all.
The Bernoulli Distribution
Parameters: $p \in [0,1]$ (probability of success)
MGF: $M_X(t) = (1-p) + pe^t = 1 + p(e^t - 1)$
Mean: $\mathbb{E}[X] = p$ | Variance: $\text{Var}(X) = p(1-p)$
rng.choice([0, 1], p=[1-p, p])
# SciPy
scipy.stats.bernoulli(p).rvs()
Discrete Distributions from Bernoulli
Sums, sequences, and counting arguments generate the discrete family
The Uniform Foundation
The discrete uniform distribution provides equal probability to each outcome—the unbiased die roll that underlies all random generation.
Discrete Uniform Discrete
Construction: Equal probability on integers $\{a, \ldots, b\}$
Special case: $\text{Bernoulli}(0.5) = \text{DiscreteUniform}(0, 1)$
rng.integers(a, b+1) # single sample
rng.integers(a, b+1, size=1000) # array
# === SciPy distribution object ===
dist = scipy.stats.randint(a, b+1)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.ppf(q) # quantile (inverse CDF)
dist.mean(), dist.var() # moments
Direct Constructions from Bernoulli
The first generation of distributions arises directly from operations on independent Bernoulli trials.
Binomial Discrete
Functional form: $S_n = \sum_{i=1}^n X_i$ where $X_i \stackrel{\text{iid}}{\sim} \text{Bernoulli}(p)$
Interpretation: Count of successes in $n$ independent Bernoulli trials
rng.binomial(n, p) # single sample
rng.binomial(n, p, size=1000) # array
# === SciPy distribution object ===
dist = scipy.stats.binom(n, p)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.sf(k) # P(X > k) survival
dist.ppf(q) # quantile (inverse CDF)
dist.mean(), dist.var() # np, np(1-p)
dist.interval(0.95) # 95% interval
Multinomial Discrete
Constraint: $\sum_{i=1}^k x_i = n$ and $\sum_{i=1}^k p_i = 1$
Generalization: Binomial$(n, p)$ is Multinomial$(n, (p, 1-p))$ with $k = 2$
# p = [p1, p2, ..., pk] probability vector
rng.multinomial(n, p) # returns [x1, ..., xk]
rng.multinomial(n, p, size=1000) # shape (1000, k)
# === SciPy distribution object ===
dist = scipy.stats.multinomial(n, p)
dist.rvs(size=1000) # sampling
dist.pmf([x1, x2, x3]) # P(X = x)
dist.mean() # [np1, np2, ..., npk]
dist.cov() # covariance matrix
Geometric Discrete
Functional form: $Y = \min\{k \geq 1 : X_k = 1\}$ where $X_k \stackrel{\text{iid}}{\sim} \text{Bernoulli}(p)$
Interpretation: First hitting time of success in Bernoulli sequence
# NumPy: returns # of trials (1,2,3,...)
rng.geometric(p, size=1000)
# === SciPy distribution object ===
dist = scipy.stats.geom(p)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(Y = k)
dist.cdf(k) # P(Y ≤ k)
dist.sf(k) # P(Y > k) = (1-p)^k
dist.ppf(q) # quantile
dist.mean(), dist.var() # 1/p, (1-p)/p²
Second-Generation Distributions
Extending the counting: sums of geometrics and limits of binomials.
Negative Binomial Discrete
Functional form: $X = \sum_{i=1}^r Y_i$ where $Y_i = \min\{k : X_k^{(i)} = 1\} \stackrel{\text{iid}}{\sim} \text{Geom}(p)$
Interpretation: Trial index of the $r$-th success in Bernoulli sequence
# NumPy: returns # failures before r successes
rng.negative_binomial(r, p, size=1000)
# === SciPy distribution object ===
# nbinom(n, p) = # failures before n successes
dist = scipy.stats.nbinom(r, p)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.ppf(q) # quantile
dist.mean(), dist.var() # r(1-p)/p, r(1-p)/p²
Poisson Discrete
Construction: Limit of Binomial$(n, p)$ as $n \to \infty$, $p \to 0$, $np = \lambda$
Interpretation: Count of rare events in fixed interval
rng.poisson(lam, size=1000)
# === SciPy distribution object ===
dist = scipy.stats.poisson(lam)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.sf(k) # P(X > k)
dist.ppf(q) # quantile
dist.mean(), dist.var() # both equal λ
💡 Key Insight: The Counting Hierarchy
Bernoulli counts success in 1 trial → Binomial counts successes in $n$ trials → Multinomial generalizes to $k$ categories → Geometric counts trials to 1st success → Negative Binomial counts trials to $r$-th success → Poisson counts rare events (infinitely many trials with vanishing probability).
The Hypergeometric Connection
Sampling without replacement modifies the Bernoulli/Binomial relationship.
Hypergeometric Discrete
Construction: $n$ draws without replacement from $N$ items ($K$ successes)
Limit: As $N \to \infty$ with $K/N \to p$: Hypergeometric $\to$ Binomial
# NumPy: ngood=K, nbad=N-K, nsample=n
rng.hypergeometric(K, N-K, n, size=1000)
# === SciPy distribution object ===
# hypergeom(M, n, N) = pop M, success n, draws N
dist = scipy.stats.hypergeom(N, K, n)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.ppf(q) # quantile
dist.mean(), dist.var() # nK/N, ...
Continuous Distributions from Discrete
Limits and transformations bridge the discrete-continuous divide
The Continuous Uniform Foundation
The continuous uniform on $[0, 1]$ is the fundamental building block of all continuous random generation—every other distribution can be generated from it via transformations. The general Uniform$(a, b)$ extends this to any bounded interval.
Continuous Uniform Continuous
Standard case: $U \sim \text{Uniform}(0,1)$ is the foundation of all random generation
Linear transformation: If $U \sim \text{Uniform}(0,1)$, then $X = a + (b-a)U \sim \text{Uniform}(a,b)$
Key method: Inverse CDF: $X = F^{-1}(U)$ has CDF $F$
rng.random(size=1000) # U(0,1)
rng.uniform(a, b, size=1000) # U(a,b)
# === SciPy (loc=a, scale=b-a) ===
dist = scipy.stats.uniform(a, b-a)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density = 1/(b-a)
dist.cdf(x) # (x-a)/(b-a)
dist.ppf(q) # a + q(b-a)
# === Python stdlib ===
random.random() # U(0,1)
random.uniform(a, b) # U(a,b)
🌉 The Central Limit Theorem Bridge
The CLT transforms discrete counts into continuous distributions.
Binomial$(n, p)$
Mean: $np$, Variance: $np(1-p)$
CLT
Normal$(0, 1)$
The standardized sum converges in distribution
The Normal Distribution: Universal Attractor
The normal distribution is the limit of sums—the hub of continuous probability.
Normal Continuous
From Bernoulli: CLT limit of standardized binomial sums
MGF: $M(t) = \exp(\mu t + \sigma^2 t^2/2)$
rng.normal(mu, sigma, size=1000)
rng.standard_normal(size=1000) # N(0,1)
# === SciPy distribution object ===
dist = scipy.stats.norm(mu, sigma)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density φ(x)
dist.cdf(x) # Φ(x)
dist.sf(x) # 1 - Φ(x)
dist.ppf(q) # Φ⁻¹(q) quantile
dist.isf(q) # Φ⁻¹(1-q)
# === Python stdlib ===
random.gauss(mu, sigma)
random.normalvariate(mu, sigma)
Distributions Derived from Normal
Squares, ratios, and sums of normals generate the inferential distributions.
Chi-Squared Continuous
Construction: Sum of $k$ squared standard normals
From Bernoulli: Binomial → Normal → Chi-squared
rng.chisquare(df, size=1000)
# === SciPy distribution object ===
dist = scipy.stats.chi2(df)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(X ≤ x)
dist.ppf(q) # quantile
dist.mean(), dist.var() # df, 2·df
Student's t Continuous
Construction: $T = Z / \sqrt{V/\nu}$ where $Z \sim N(0,1)$, $V \sim \chi^2_\nu$, independent
rng.standard_t(df, size=1000) # t(df)
# === SciPy distribution object ===
dist = scipy.stats.t(df, loc=mu, scale=sigma)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(T ≤ x)
dist.ppf(q) # quantile (t-critical)
dist.interval(0.95) # 95% interval
F Distribution Continuous
Construction: Ratio of two independent chi-squareds (scaled)
Connection: $T^2 \sim F_{1, \nu}$ where $T \sim t_\nu$
rng.f(dfnum, dfden, size=1000)
# === SciPy distribution object ===
dist = scipy.stats.f(dfnum, dfden)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(F ≤ x)
dist.ppf(q) # F-critical value
dist.sf(x) # p-value = 1 - CDF
Cauchy Continuous
Construction: $t_1 = $ Student's $t$ with $\nu = 1$ degree of freedom
Key property: No mean or variance (moments do not exist!)
rng.standard_cauchy(size=1000) # Cauchy(0,1)
# === SciPy (loc=x₀, scale=γ) ===
dist = scipy.stats.cauchy(loc=x0, scale=gamma)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(X ≤ x)
dist.ppf(q) # quantile
dist.median() # = x₀ (mean undefined!)
# === Via t-distribution (ν=1) ===
scipy.stats.t(df=1) # equivalent!
The Waiting Time Path: Geometric → Exponential → Gamma
Continuous waiting times emerge as limits of discrete counts.
🌉 Geometric to Exponential Limit
Geometric$(p)$
Waiting time: discrete trials
$np \to \lambda$
Exponential$(\lambda)$
Waiting time: continuous
Gamma Continuous
Construction: Sum of $\alpha$ iid Exponential$(\beta)$ when $\alpha \in \mathbb{N}$
Special cases: Exp$(\lambda) = $ Gamma$(1, \lambda)$; $\chi^2_k = $ Gamma$(k/2, 1/2)$
rng.exponential(scale=1/lam, size=1000)
scipy.stats.expon(scale=1/lam) # full dist object
random.expovariate(lam) # stdlib
# === General Gamma (shape=α, scale=1/β) ===
rng.gamma(shape=alpha, scale=1/beta, size=1000)
# === SciPy (a=shape, scale=1/rate) ===
dist = scipy.stats.gamma(alpha, scale=1/beta)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(X ≤ x)
dist.ppf(q) # quantile
dist.mean(), dist.var() # α/β, α/β²
# === Python stdlib ===
random.gammavariate(alpha, 1/beta)
The Beta Distribution: Continuous Analog of Bernoulli
The Bernoulli distribution models a binary outcome $\{0, 1\}$ with probability $p$. The Beta distribution is its continuous analog: it lives on $[0, 1]$ and models the probability parameter $p$ itself. Just as Bernoulli is the atom of discrete probability, Beta is the atom of continuous probability on bounded intervals—and they are intimately connected through Bayesian conjugacy.
Beta Continuous
Bernoulli parallel: Bernoulli PMF $p^k(1-p)^{1-k}$ ↔ Beta PDF $\propto x^{\alpha-1}(1-x)^{\beta-1}$
Construction: $X/(X+Y)$ where $X \sim \text{Gamma}(\alpha, 1)$, $Y \sim \text{Gamma}(\beta, 1)$ independent
Special case: Beta$(1,1) = $ Uniform$(0,1)$
rng.beta(alpha, beta, size=1000)
# === SciPy distribution object ===
dist = scipy.stats.beta(alpha, beta)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(X ≤ x)
dist.ppf(q) # quantile
dist.mean(), dist.var() # α/(α+β), ...
dist.interval(0.95) # credible interval
# === Python stdlib ===
random.betavariate(alpha, beta)
Complete Distribution Hierarchy
| Distribution | Type | Connection to Bernoulli | Key Transform |
|---|---|---|---|
| Discrete Uniform$(a,b)$ | Discrete | Binary expansion of Bernoulli$(0.5)$ | $\sum B_i \cdot 2^{i-1}$ |
| Bernoulli$(p)$ | Discrete | Foundation (binary) | — |
| Binomial$(n,p)$ | Discrete | Sum of $n$ Bernoullis | $S = \sum_{i=1}^n X_i$ |
| Multinomial$(n, \mathbf{p})$ | Discrete | $k$-category generalization of Binomial | Count each of $k$ outcomes in $n$ trials |
| Geometric$(p)$ | Discrete | First success in Bernoulli seq. | $Y = \min\{k: X_k = 1\}$ |
| Negative Binomial$(r,p)$ | Discrete | Sum of $r$ Geometrics | $S = \sum_{i=1}^r Y_i$ |
| Poisson$(\lambda)$ | Discrete | Binomial limit: $n\to\infty$, $np\to\lambda$ | Rare events limit |
| Uniform$(a,b)$ | Continuous | Foundation for random generation | Inverse CDF: $X = F^{-1}(U)$ |
| Normal$(0,1)$ | Continuous | CLT limit of standardized Binomial | $(S_n - np)/\sqrt{np(1-p)}$ |
| Exponential$(\lambda)$ | Continuous | Geometric limit (continuous time) | $p \to 0$, $np \to \lambda$ |
| Gamma$(\alpha, \beta)$ | Continuous | Sum of Exponentials; NegBin limit | $S = \sum \text{Exp}_i$ |
| Chi-squared$_k$ | Continuous | Sum of squared Normals | $Q = \sum Z_i^2$ |
| Student's $t_\nu$ | Continuous | Normal/Chi-squared ratio | $Z/\sqrt{V/\nu}$ |
| $F_{d_1, d_2}$ | Continuous | Chi-squared ratio | $(U/d_1)/(V/d_2)$ |
| Cauchy$(x_0, \gamma)$ | Continuous | $t_1$ (Student's $t$ with $\nu=1$) | $Z_1/Z_2$ for iid Normals |
| Beta$(\alpha, \beta)$ | Continuous | Gamma ratio; Bernoulli conjugate | $X/(X+Y)$; continuous analog of Bernoulli |
| Hypergeometric$(N,K,n)$ | Discrete | Conditional Binomial; sampling w/o replacement | $P(X=k \mid X+Y=K)$ |
💡 The Grand Unified View
Every distribution in classical probability can be traced back to the Bernoulli trial and the Uniform distribution. The Uniform$(0,1)$ is the foundation of all random generation via the inverse CDF method. The key operations connecting distributions are: summation (Binomial, Negative Binomial, Gamma, Chi-squared), generalization (Multinomial extends Binomial to $k$ categories), waiting/counting (Geometric, Exponential), limits (Poisson, Normal, Exponential), ratios (t, F, Beta, Cauchy), and conditioning (Hypergeometric). The CLT is the great bridge between the discrete and continuous worlds, making the Normal distribution the universal attractor for sums. The Beta distribution plays a special role as the continuous analog of Bernoulli—living on $[0,1]$ just as Bernoulli probabilities do, while Dirichlet generalizes Beta to probability vectors just as Multinomial generalizes Binomial.
Reproducibility & Parallel Random Generation
Proper seeding ensures reproducible results. For parallel computing, use SeedSequence.spawn() to create independent streams—never use sequential seeds like 1, 2, 3.