The Bernoulli Family Tree

How One Coin Flip Generates All of Probability Theory

The Bernoulli distribution is the atom of probability. From this simplest random experiment—a single trial with probability $p$ of success—we can derive every major probability distribution through combinations, limits, and transformations.

This interactive guide shows how discrete distributions arise from sums and sequences of Bernoulli trials, how continuous distributions emerge as limiting cases, and the mathematical machinery (moment generating functions, limits, and transformations) that connects them all.

The Bernoulli Distribution

$$X \sim \text{Bernoulli}(p), \quad P(X = k) = p^k(1-p)^{1-k}, \quad k \in \{0, 1\}$$

Parameters: $p \in [0,1]$ (probability of success)

MGF: $M_X(t) = (1-p) + pe^t = 1 + p(e^t - 1)$

Mean: $\mathbb{E}[X] = p$  |  Variance: $\text{Var}(X) = p(1-p)$

# NumPy
rng.choice([0, 1], p=[1-p, p])
# SciPy
scipy.stats.bernoulli(p).rvs()

Discrete Distributions from Bernoulli

Sums, sequences, and counting arguments generate the discrete family

🎲 The Uniform Foundation

The discrete uniform distribution provides equal probability to each outcome—the unbiased die roll that underlies all random generation.

Discrete Uniform Discrete

$$X \sim \text{DiscreteUniform}(a, b)$$ $$P(X = k) = \frac{1}{b - a + 1}, \quad k \in \{a, a+1, \ldots, b\}$$

Construction: Equal probability on integers $\{a, \ldots, b\}$

Special case: $\text{Bernoulli}(0.5) = \text{DiscreteUniform}(0, 1)$

Python
# === Sampling ===
rng.integers(a, b+1) # single sample
rng.integers(a, b+1, size=1000) # array

# === SciPy distribution object ===
dist = scipy.stats.randint(a, b+1)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.ppf(q) # quantile (inverse CDF)
dist.mean(), dist.var() # moments

🔗 Direct Constructions from Bernoulli

The first generation of distributions arises directly from operations on independent Bernoulli trials.

Binomial Discrete

$$S_n \sim \text{Binomial}(n, p)$$ $$P(S_n = k) = \binom{n}{k}p^k(1-p)^{n-k}, \quad k = 0, 1, \ldots, n$$

Functional form: $S_n = \sum_{i=1}^n X_i$ where $X_i \stackrel{\text{iid}}{\sim} \text{Bernoulli}(p)$

Interpretation: Count of successes in $n$ independent Bernoulli trials

Python
# === Sampling ===
rng.binomial(n, p) # single sample
rng.binomial(n, p, size=1000) # array

# === SciPy distribution object ===
dist = scipy.stats.binom(n, p)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.sf(k) # P(X > k) survival
dist.ppf(q) # quantile (inverse CDF)
dist.mean(), dist.var() # np, np(1-p)
dist.interval(0.95) # 95% interval

Multinomial Discrete

$$\mathbf{X} \sim \text{Multinomial}(n, \mathbf{p})$$ $$P(X_1 = x_1, \ldots, X_k = x_k) = \frac{n!}{x_1! \cdots x_k!} p_1^{x_1} \cdots p_k^{x_k}$$

Constraint: $\sum_{i=1}^k x_i = n$ and $\sum_{i=1}^k p_i = 1$

Generalization: Binomial$(n, p)$ is Multinomial$(n, (p, 1-p))$ with $k = 2$

Python
# === Sampling ===
# p = [p1, p2, ..., pk] probability vector
rng.multinomial(n, p) # returns [x1, ..., xk]
rng.multinomial(n, p, size=1000) # shape (1000, k)

# === SciPy distribution object ===
dist = scipy.stats.multinomial(n, p)
dist.rvs(size=1000) # sampling
dist.pmf([x1, x2, x3]) # P(X = x)
dist.mean() # [np1, np2, ..., npk]
dist.cov() # covariance matrix

Geometric Discrete

$$Y \sim \text{Geometric}(p)$$ $$P(Y = k) = (1-p)^{k-1}p, \quad k = 1, 2, \ldots$$

Functional form: $Y = \min\{k \geq 1 : X_k = 1\}$ where $X_k \stackrel{\text{iid}}{\sim} \text{Bernoulli}(p)$

Interpretation: First hitting time of success in Bernoulli sequence

Python
# === Sampling ===
# NumPy: returns # of trials (1,2,3,...)
rng.geometric(p, size=1000)

# === SciPy distribution object ===
dist = scipy.stats.geom(p)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(Y = k)
dist.cdf(k) # P(Y ≤ k)
dist.sf(k) # P(Y > k) = (1-p)^k
dist.ppf(q) # quantile
dist.mean(), dist.var() # 1/p, (1-p)/p²

🔗 Second-Generation Distributions

Extending the counting: sums of geometrics and limits of binomials.

Negative Binomial Discrete

$$X \sim \text{NegBin}(r, p)$$ $$P(X = k) = \binom{k-1}{r-1}p^r(1-p)^{k-r}, \quad k = r, r+1, \ldots$$

Functional form: $X = \sum_{i=1}^r Y_i$ where $Y_i = \min\{k : X_k^{(i)} = 1\} \stackrel{\text{iid}}{\sim} \text{Geom}(p)$

Interpretation: Trial index of the $r$-th success in Bernoulli sequence

Python
# === Sampling ===
# NumPy: returns # failures before r successes
rng.negative_binomial(r, p, size=1000)

# === SciPy distribution object ===
# nbinom(n, p) = # failures before n successes
dist = scipy.stats.nbinom(r, p)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.ppf(q) # quantile
dist.mean(), dist.var() # r(1-p)/p, r(1-p)/p²

Poisson Discrete

$$X \sim \text{Poisson}(\lambda)$$ $$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots$$

Construction: Limit of Binomial$(n, p)$ as $n \to \infty$, $p \to 0$, $np = \lambda$

Interpretation: Count of rare events in fixed interval

Python
# === Sampling ===
rng.poisson(lam, size=1000)

# === SciPy distribution object ===
dist = scipy.stats.poisson(lam)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.sf(k) # P(X > k)
dist.ppf(q) # quantile
dist.mean(), dist.var() # both equal λ

💡 Key Insight: The Counting Hierarchy

Bernoulli counts success in 1 trial → Binomial counts successes in $n$ trials → Multinomial generalizes to $k$ categories → Geometric counts trials to 1st success → Negative Binomial counts trials to $r$-th success → Poisson counts rare events (infinitely many trials with vanishing probability).

🔗 The Hypergeometric Connection

Sampling without replacement modifies the Bernoulli/Binomial relationship.

Hypergeometric Discrete

$$X \sim \text{Hypergeometric}(N, K, n)$$ $$P(X = k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}$$

Construction: $n$ draws without replacement from $N$ items ($K$ successes)

Limit: As $N \to \infty$ with $K/N \to p$: Hypergeometric $\to$ Binomial

Python
# === Sampling ===
# NumPy: ngood=K, nbad=N-K, nsample=n
rng.hypergeometric(K, N-K, n, size=1000)

# === SciPy distribution object ===
# hypergeom(M, n, N) = pop M, success n, draws N
dist = scipy.stats.hypergeom(N, K, n)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.ppf(q) # quantile
dist.mean(), dist.var() # nK/N, ...

Continuous Distributions from Discrete

Limits and transformations bridge the discrete-continuous divide

🎲 The Continuous Uniform Foundation

The continuous uniform on $[0, 1]$ is the fundamental building block of all continuous random generation—every other distribution can be generated from it via transformations. The general Uniform$(a, b)$ extends this to any bounded interval.

Continuous Uniform Continuous

$$U \sim \text{Uniform}(a, b)$$ $$f(x) = \frac{1}{b-a}, \quad F(x) = \frac{x - a}{b - a}, \quad x \in [a, b]$$

Standard case: $U \sim \text{Uniform}(0,1)$ is the foundation of all random generation

Linear transformation: If $U \sim \text{Uniform}(0,1)$, then $X = a + (b-a)U \sim \text{Uniform}(a,b)$

Key method: Inverse CDF: $X = F^{-1}(U)$ has CDF $F$

Python
# === Sampling ===
rng.random(size=1000) # U(0,1)
rng.uniform(a, b, size=1000) # U(a,b)

# === SciPy (loc=a, scale=b-a) ===
dist = scipy.stats.uniform(a, b-a)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density = 1/(b-a)
dist.cdf(x) # (x-a)/(b-a)
dist.ppf(q) # a + q(b-a)

# === Python stdlib ===
random.random() # U(0,1)
random.uniform(a, b) # U(a,b)

🌉 The Central Limit Theorem Bridge

The CLT transforms discrete counts into continuous distributions.

Binomial$(n, p)$

$$S_n = \sum_{i=1}^n X_i, \quad X_i \stackrel{\text{iid}}{\sim} \text{Bernoulli}(p)$$

Mean: $np$, Variance: $np(1-p)$

$n \to \infty$
CLT

Normal$(0, 1)$

$$Z_n = \frac{S_n - np}{\sqrt{np(1-p)}} \xrightarrow{d} N(0,1)$$

The standardized sum converges in distribution

🔗 The Normal Distribution: Universal Attractor

The normal distribution is the limit of sums—the hub of continuous probability.

Normal Continuous

$$X \sim N(\mu, \sigma^2)$$ $$f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$

From Bernoulli: CLT limit of standardized binomial sums

MGF: $M(t) = \exp(\mu t + \sigma^2 t^2/2)$

Python
# === Sampling ===
rng.normal(mu, sigma, size=1000)
rng.standard_normal(size=1000) # N(0,1)

# === SciPy distribution object ===
dist = scipy.stats.norm(mu, sigma)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density φ(x)
dist.cdf(x) # Φ(x)
dist.sf(x) # 1 - Φ(x)
dist.ppf(q) # Φ⁻¹(q) quantile
dist.isf(q) # Φ⁻¹(1-q)

# === Python stdlib ===
random.gauss(mu, sigma)
random.normalvariate(mu, sigma)

🔗 Distributions Derived from Normal

Squares, ratios, and sums of normals generate the inferential distributions.

Chi-Squared Continuous

$$X \sim \chi^2_k$$ $$f(x) = \frac{x^{k/2-1}e^{-x/2}}{2^{k/2}\Gamma(k/2)}, \quad x > 0$$

Construction: Sum of $k$ squared standard normals

From Bernoulli: Binomial → Normal → Chi-squared

Python
# === Sampling ===
rng.chisquare(df, size=1000)

# === SciPy distribution object ===
dist = scipy.stats.chi2(df)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(X ≤ x)
dist.ppf(q) # quantile
dist.mean(), dist.var() # df, 2·df

Student's t Continuous

$$T \sim t_\nu$$ $$f(t) = \frac{\Gamma((\nu+1)/2)}{\sqrt{\nu\pi}\,\Gamma(\nu/2)}\left(1 + \frac{t^2}{\nu}\right)^{-(\nu+1)/2}$$

Construction: $T = Z / \sqrt{V/\nu}$ where $Z \sim N(0,1)$, $V \sim \chi^2_\nu$, independent

Python
# === Sampling ===
rng.standard_t(df, size=1000) # t(df)

# === SciPy distribution object ===
dist = scipy.stats.t(df, loc=mu, scale=sigma)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(T ≤ x)
dist.ppf(q) # quantile (t-critical)
dist.interval(0.95) # 95% interval

F Distribution Continuous

$$X \sim F_{d_1, d_2}$$ $$F = \frac{U/d_1}{V/d_2}$$

Construction: Ratio of two independent chi-squareds (scaled)

Connection: $T^2 \sim F_{1, \nu}$ where $T \sim t_\nu$

Python
# === Sampling ===
rng.f(dfnum, dfden, size=1000)

# === SciPy distribution object ===
dist = scipy.stats.f(dfnum, dfden)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(F ≤ x)
dist.ppf(q) # F-critical value
dist.sf(x) # p-value = 1 - CDF

Cauchy Continuous

$$X \sim \text{Cauchy}(x_0, \gamma)$$ $$f(x) = \frac{1}{\pi\gamma\left[1 + \left(\frac{x - x_0}{\gamma}\right)^2\right]}$$

Construction: $t_1 = $ Student's $t$ with $\nu = 1$ degree of freedom

Key property: No mean or variance (moments do not exist!)

Python
# === Sampling ===
rng.standard_cauchy(size=1000) # Cauchy(0,1)

# === SciPy (loc=x₀, scale=γ) ===
dist = scipy.stats.cauchy(loc=x0, scale=gamma)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(X ≤ x)
dist.ppf(q) # quantile
dist.median() # = x₀ (mean undefined!)

# === Via t-distribution (ν=1) ===
scipy.stats.t(df=1) # equivalent!

🔗 The Waiting Time Path: Geometric → Exponential → Gamma

Continuous waiting times emerge as limits of discrete counts.

🌉 Geometric to Exponential Limit

Geometric$(p)$

$$P(X > k) = (1-p)^k$$

Waiting time: discrete trials

$p \to 0$
$np \to \lambda$

Exponential$(\lambda)$

$$P(T > t) = e^{-\lambda t}$$

Waiting time: continuous

Gamma Continuous

$$X \sim \text{Gamma}(\alpha, \beta)$$ $$f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}, \quad x > 0$$

Construction: Sum of $\alpha$ iid Exponential$(\beta)$ when $\alpha \in \mathbb{N}$

Special cases: Exp$(\lambda) = $ Gamma$(1, \lambda)$; $\chi^2_k = $ Gamma$(k/2, 1/2)$

Python
# === Exponential (Gamma with α=1) ===
rng.exponential(scale=1/lam, size=1000)
scipy.stats.expon(scale=1/lam) # full dist object
random.expovariate(lam) # stdlib

# === General Gamma (shape=α, scale=1/β) ===
rng.gamma(shape=alpha, scale=1/beta, size=1000)

# === SciPy (a=shape, scale=1/rate) ===
dist = scipy.stats.gamma(alpha, scale=1/beta)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(X ≤ x)
dist.ppf(q) # quantile
dist.mean(), dist.var() # α/β, α/β²

# === Python stdlib ===
random.gammavariate(alpha, 1/beta)

🔗 The Beta Distribution: Continuous Analog of Bernoulli

The Bernoulli distribution models a binary outcome $\{0, 1\}$ with probability $p$. The Beta distribution is its continuous analog: it lives on $[0, 1]$ and models the probability parameter $p$ itself. Just as Bernoulli is the atom of discrete probability, Beta is the atom of continuous probability on bounded intervals—and they are intimately connected through Bayesian conjugacy.

Beta Continuous

$$X \sim \text{Beta}(\alpha, \beta)$$ $$f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}, \quad x \in [0,1]$$

Bernoulli parallel: Bernoulli PMF $p^k(1-p)^{1-k}$ ↔ Beta PDF $\propto x^{\alpha-1}(1-x)^{\beta-1}$

Construction: $X/(X+Y)$ where $X \sim \text{Gamma}(\alpha, 1)$, $Y \sim \text{Gamma}(\beta, 1)$ independent

Special case: Beta$(1,1) = $ Uniform$(0,1)$

Python
# === Sampling ===
rng.beta(alpha, beta, size=1000)

# === SciPy distribution object ===
dist = scipy.stats.beta(alpha, beta)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(X ≤ x)
dist.ppf(q) # quantile
dist.mean(), dist.var() # α/(α+β), ...
dist.interval(0.95) # credible interval

# === Python stdlib ===
random.betavariate(alpha, beta)

📊 Complete Distribution Hierarchy

Distribution Type Connection to Bernoulli Key Transform
Discrete Uniform$(a,b)$ Discrete Binary expansion of Bernoulli$(0.5)$ $\sum B_i \cdot 2^{i-1}$
Bernoulli$(p)$ Discrete Foundation (binary)
Binomial$(n,p)$ Discrete Sum of $n$ Bernoullis $S = \sum_{i=1}^n X_i$
Multinomial$(n, \mathbf{p})$ Discrete $k$-category generalization of Binomial Count each of $k$ outcomes in $n$ trials
Geometric$(p)$ Discrete First success in Bernoulli seq. $Y = \min\{k: X_k = 1\}$
Negative Binomial$(r,p)$ Discrete Sum of $r$ Geometrics $S = \sum_{i=1}^r Y_i$
Poisson$(\lambda)$ Discrete Binomial limit: $n\to\infty$, $np\to\lambda$ Rare events limit
Uniform$(a,b)$ Continuous Foundation for random generation Inverse CDF: $X = F^{-1}(U)$
Normal$(0,1)$ Continuous CLT limit of standardized Binomial $(S_n - np)/\sqrt{np(1-p)}$
Exponential$(\lambda)$ Continuous Geometric limit (continuous time) $p \to 0$, $np \to \lambda$
Gamma$(\alpha, \beta)$ Continuous Sum of Exponentials; NegBin limit $S = \sum \text{Exp}_i$
Chi-squared$_k$ Continuous Sum of squared Normals $Q = \sum Z_i^2$
Student's $t_\nu$ Continuous Normal/Chi-squared ratio $Z/\sqrt{V/\nu}$
$F_{d_1, d_2}$ Continuous Chi-squared ratio $(U/d_1)/(V/d_2)$
Cauchy$(x_0, \gamma)$ Continuous $t_1$ (Student's $t$ with $\nu=1$) $Z_1/Z_2$ for iid Normals
Beta$(\alpha, \beta)$ Continuous Gamma ratio; Bernoulli conjugate $X/(X+Y)$; continuous analog of Bernoulli
Hypergeometric$(N,K,n)$ Discrete Conditional Binomial; sampling w/o replacement $P(X=k \mid X+Y=K)$

💡 The Grand Unified View

Every distribution in classical probability can be traced back to the Bernoulli trial and the Uniform distribution. The Uniform$(0,1)$ is the foundation of all random generation via the inverse CDF method. The key operations connecting distributions are: summation (Binomial, Negative Binomial, Gamma, Chi-squared), generalization (Multinomial extends Binomial to $k$ categories), waiting/counting (Geometric, Exponential), limits (Poisson, Normal, Exponential), ratios (t, F, Beta, Cauchy), and conditioning (Hypergeometric). The CLT is the great bridge between the discrete and continuous worlds, making the Normal distribution the universal attractor for sums. The Beta distribution plays a special role as the continuous analog of Bernoulli—living on $[0,1]$ just as Bernoulli probabilities do, while Dirichlet generalizes Beta to probability vectors just as Multinomial generalizes Binomial.

Reproducibility & Parallel Random Generation

Proper seeding ensures reproducible results. For parallel computing, use SeedSequence.spawn() to create independent streams—never use sequential seeds like 1, 2, 3.

📚 Library Documentation & Resources

NumPy Random

SciPy Statistics

Python Standard Library