The Bernoulli Family Tree: Distribution Relationships

The Bernoulli distribution is the atom of probability. From this simplest random experiment—a single trial with probability $p$ of success—we can derive every major probability distribution through combinations, limits, and transformations.

This interactive guide shows how discrete distributions arise from sums and sequences of Bernoulli trials, how continuous distributions emerge as limiting cases, and the mathematical machinery (moment generating functions, limits, and transformations) that connects them all.

The Bernoulli Distribution

X \sim \text{Bernoulli}(p), \quad P(X = k) = p^k(1-p)^{1-k}, \quad k \in \{0, 1\}

Parameters: $p \in [0,1]$ (probability of success)

MGF: $M_X(t) = (1-p) + pe^t = 1 + p(e^t - 1)$

Mean: $\mathbb{E}[X] = p$ | Variance: $\text{Var}(X) = p(1-p)$

                # NumPy

                rng.choice([0, 1], p=[1-p, p])

                # SciPy

                scipy.stats.bernoulli(p).rvs()

Discrete Distributions from Bernoulli

Sums, sequences, and counting arguments generate the discrete family

🎲 The Uniform Foundation

The discrete uniform distribution provides equal probability to each outcome—the unbiased die roll that underlies all random generation.

Discrete Uniform Discrete

$$X \sim \text{DiscreteUniform}(a, b)$$ $$P(X = k) = \frac{1}{b - a + 1}, \quad k \in \{a, a+1, \ldots, b\}$$

Construction: Equal probability on integers $\{a, \ldots, b\}$

Special case: $\text{Bernoulli}(0.5) = \text{DiscreteUniform}(0, 1)$

Path: Bernoulli$(0.5)$ → Binary expansion → DiscreteUniform$(a,b)$

Binary Expansion Construction

When the range size $n = b - a + 1$ is a power of 2, discrete uniform distributions can be constructed exactly from independent fair coin flips (Bernoulli$(0.5)$) using binary representation.

Step 1: Define the range (power of 2)

Suppose $n = 2^m$ for some integer $m$. We need exactly $m$ bits.

Step 2: Generate binary digits from Bernoulli

Let $B_1, B_2, \ldots, B_m \stackrel{\text{iid}}{\sim} \text{Bernoulli}(0.5)$. Construct:

Y = \sum_{i=1}^{m} B_i \cdot 2^{i-1} \sim \text{DiscreteUniform}(0, 2^m - 1)

Each of the $2^m$ binary strings is equally likely, so $Y$ is uniform on $\{0, 1, \ldots, 2^m - 1\}$.

Step 3: Shift to desired range

Return $X = a + Y$, giving DiscreteUniform$(a, a + 2^m - 1)$.

X = a + \sum_{i=1}^{m} B_i \cdot 2^{i-1}

Non-power-of-2 ranges

When $n$ is not a power of 2, additional techniques are required to avoid bias. These methods (including rejection sampling) are covered in Chapter 2: Monte Carlo Methods.

Moments

\mathbb{E}[X] = \frac{a + b}{2}, \qquad \text{Var}(X) = \frac{(b - a + 1)^2 - 1}{12}

MGF

M_X(t) = \frac{e^{at}(1 - e^{(b-a+1)t})}{(b-a+1)(1 - e^t)}

Python

# === Sampling ===
rng.integers(a, b+1) # single sample
rng.integers(a, b+1, size=1000) # array

# === SciPy distribution object ===
dist = scipy.stats.randint(a, b+1)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.ppf(q) # quantile (inverse CDF)
dist.mean(), dist.var() # moments

⬇

🔗 Direct Constructions from Bernoulli

The first generation of distributions arises directly from operations on independent Bernoulli trials.

Binomial Discrete

$$S_n \sim \text{Binomial}(n, p)$$ $$P(S_n = k) = \binom{n}{k}p^k(1-p)^{n-k}, \quad k = 0, 1, \ldots, n$$

Functional form: $S_n = \sum_{i=1}^n X_i$ where $X_i \stackrel{\text{iid}}{\sim} \text{Bernoulli}(p)$

Interpretation: Count of successes in $n$ independent Bernoulli trials

Path: $X_1, \ldots, X_n \stackrel{\text{iid}}{\sim}$ Bernoulli$(p)$ → $S_n = \sum_{i=1}^n X_i$ → Binomial$(n,p)$

Derivation from Bernoulli via MGF

Let $X_1, X_2, \ldots, X_n \stackrel{\text{iid}}{\sim} \text{Bernoulli}(p)$ and define $S_n = \sum_{i=1}^n X_i$.

Step 1: Start with Bernoulli MGF

Each Bernoulli random variable has MGF:

M_{X_i}(t) = \mathbb{E}[e^{tX_i}] = (1-p) \cdot e^{t \cdot 0} + p \cdot e^{t \cdot 1} = (1-p) + pe^t

Step 2: Apply sum property for independent RVs

For independent random variables, the MGF of the sum equals the product of MGFs:

M_{S_n}(t) = \prod_{i=1}^n M_{X_i}(t) = \left[(1-p) + pe^t\right]^n

Step 3: Identify the Binomial distribution

This is exactly the MGF of Binomial$(n, p)$. Since MGFs uniquely determine distributions:

S_n = \sum_{i=1}^n X_i \sim \text{Binomial}(n, p)

Alternative: Direct Combinatorial Derivation

The PMF follows from counting: choose which $k$ of the $n$ trials are successes.

P(S_n = k) = \underbrace{\binom{n}{k}}_{\substack{\text{ways to choose}\\\text{which } k \text{ succeed}}} \times \underbrace{p^k}_{\substack{\text{prob. of } k\\\text{successes}}} \times \underbrace{(1-p)^{n-k}}_{\substack{\text{prob. of } n-k\\\text{failures}}}

Python

# === Sampling ===
rng.binomial(n, p) # single sample
rng.binomial(n, p, size=1000) # array

# === SciPy distribution object ===
dist = scipy.stats.binom(n, p)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.sf(k) # P(X > k) survival
dist.ppf(q) # quantile (inverse CDF)
dist.mean(), dist.var() # np, np(1-p)
dist.interval(0.95) # 95% interval

Multinomial Discrete

$$\mathbf{X} \sim \text{Multinomial}(n, \mathbf{p})$$ $$P(X_1 = x_1, \ldots, X_k = x_k) = \frac{n!}{x_1! \cdots x_k!} p_1^{x_1} \cdots p_k^{x_k}$$

Constraint: $\sum_{i=1}^k x_i = n$ and $\sum_{i=1}^k p_i = 1$

Generalization: Binomial$(n, p)$ is Multinomial$(n, (p, 1-p))$ with $k = 2$

Path: $n$ iid categorical trials → Count each category → Multinomial$(n, \mathbf{p})$

From Bernoulli to Multinomial

The Multinomial generalizes the Binomial from 2 categories to $k$ categories. Just as Binomial counts successes vs failures, Multinomial counts outcomes in each of $k$ possible categories.

Step 1: Generalized Bernoulli (Categorical) trial

A single trial results in one of $k$ mutually exclusive outcomes with probabilities $p_1, p_2, \ldots, p_k$ where $\sum_{i=1}^k p_i = 1$.

This is a Categorical distribution—the multi-outcome analog of Bernoulli.

Step 2: Perform $n$ independent trials

Let $X_j$ = count of trials resulting in category $j$, for $j = 1, \ldots, k$.

By construction: $\sum_{j=1}^k X_j = n$ (every trial lands in exactly one category).

Step 3: Derive the PMF

To get exactly $(x_1, \ldots, x_k)$ counts:

P(\mathbf{X} = \mathbf{x}) = \underbrace{\binom{n}{x_1, \ldots, x_k}}_{\text{multinomial coeff.}} \times \underbrace{p_1^{x_1} \cdots p_k^{x_k}}_{\text{probability of one such sequence}}

The multinomial coefficient $\frac{n!}{x_1! \cdots x_k!}$ counts the ways to arrange $n$ trials into groups of sizes $x_1, \ldots, x_k$.

Connection to Binomial

Special case: $k = 2$

With two categories (success/failure) and $\mathbf{p} = (p, 1-p)$:

P(X_1 = x, X_2 = n-x) = \frac{n!}{x!(n-x)!}p^x(1-p)^{n-x} = \binom{n}{x}p^x(1-p)^{n-x}

This is exactly Binomial$(n, p)$!

Marginal Distributions

Each component is Binomial

Marginally, each $X_j \sim \text{Binomial}(n, p_j)$. But the $X_j$'s are not independent—they must sum to $n$.

\mathbb{E}[X_j] = np_j, \qquad \text{Var}(X_j) = np_j(1 - p_j)

\text{Cov}(X_i, X_j) = -np_i p_j \quad \text{for } i \neq j

The negative covariance reflects the constraint: more in one category means fewer in others.

Conjugate Prior: Dirichlet

Just as Beta is conjugate to Binomial, the Dirichlet distribution is conjugate to Multinomial—it's the multivariate generalization of Beta for probability vectors.

Python

# === Sampling ===
# p = [p1, p2, ..., pk] probability vector
rng.multinomial(n, p) # returns [x1, ..., xk]
rng.multinomial(n, p, size=1000) # shape (1000, k)

# === SciPy distribution object ===
dist = scipy.stats.multinomial(n, p)
dist.rvs(size=1000) # sampling
dist.pmf([x1, x2, x3]) # P(X = x)
dist.mean() # [np1, np2, ..., npk]
dist.cov() # covariance matrix

Geometric Discrete

$$Y \sim \text{Geometric}(p)$$ $$P(Y = k) = (1-p)^{k-1}p, \quad k = 1, 2, \ldots$$

Functional form: $Y = \min\{k \geq 1 : X_k = 1\}$ where $X_k \stackrel{\text{iid}}{\sim} \text{Bernoulli}(p)$

Interpretation: First hitting time of success in Bernoulli sequence

Path: $X_1, X_2, \ldots \stackrel{\text{iid}}{\sim}$ Bernoulli$(p)$ → $Y = \min\{k : X_k = 1\}$ → Geometric$(p)$

Explicit Functional Construction

Let $X_1, X_2, X_3, \ldots$ be an infinite sequence of iid Bernoulli$(p)$ trials. Define:

Y = \min\{k \in \mathbb{N} : X_k = 1\} = \text{index of first success}

Step 1: Characterize the event $\{Y = k\}$

The event $\{Y = k\}$ occurs if and only if:

\{Y = k\} = \{X_1 = 0, X_2 = 0, \ldots, X_{k-1} = 0, X_k = 1\}

Step 2: Apply independence

Since all $X_i$ are independent:

P(Y = k) = P(X_1 = 0) \cdots P(X_{k-1} = 0) \cdot P(X_k = 1) = (1-p)^{k-1} \cdot p

Step 3: Verify PMF sums to 1

\sum_{k=1}^{\infty} p(1-p)^{k-1} = p \sum_{j=0}^{\infty} (1-p)^{j} = p \cdot \frac{1}{p} = 1 \checkmark

Survival Function (from Bernoulli product)

P(Y > k) = P(X_1 = 0, \ldots, X_k = 0) = (1-p)^k

Key Property: Memoryless

Geometric is the only discrete memoryless distribution:

P(Y > m + n \mid Y > m) = P(Y > n)

Python

# === Sampling ===
# NumPy: returns # of trials (1,2,3,...)
rng.geometric(p, size=1000)

# === SciPy distribution object ===
dist = scipy.stats.geom(p)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(Y = k)
dist.cdf(k) # P(Y ≤ k)
dist.sf(k) # P(Y > k) = (1-p)^k
dist.ppf(q) # quantile
dist.mean(), dist.var() # 1/p, (1-p)/p²

⬇

🔗 Second-Generation Distributions

Extending the counting: sums of geometrics and limits of binomials.

Negative Binomial Discrete

$$X \sim \text{NegBin}(r, p)$$ $$P(X = k) = \binom{k-1}{r-1}p^r(1-p)^{k-r}, \quad k = r, r+1, \ldots$$

Functional form: $X = \sum_{i=1}^r Y_i$ where $Y_i = \min\{k : X_k^{(i)} = 1\} \stackrel{\text{iid}}{\sim} \text{Geom}(p)$

Interpretation: Trial index of the $r$-th success in Bernoulli sequence

Path: Bernoulli$(p)$ → Geometric$(p)$ → Sum $r$ copies → NegBin$(r,p)$

Derivation via Geometric Sum (from Bernoulli)

Let $Y_1, \ldots, Y_r \stackrel{\text{iid}}{\sim} \text{Geometric}(p)$ (each itself from Bernoulli trials), and define $X = \sum_{i=1}^r Y_i$.

Step 1: Recall the Geometric MGF (from Bernoulli)

From the Geometric derivation, each waiting time has MGF:

M_{Y_i}(t) = \frac{pe^t}{1 - (1-p)e^t}

Step 2: Apply product rule for independent sums

For the sum of $r$ independent Geometrics:

M_X(t) = \prod_{i=1}^r M_{Y_i}(t) = \left[\frac{pe^t}{1 - (1-p)e^t}\right]^r

Step 3: Identify the Negative Binomial

This is exactly the MGF of NegBin$(r, p)$, confirming:

X = \sum_{i=1}^r Y_i \sim \text{NegBin}(r, p)

Alternative: Direct from Bernoulli Trials

The event $\{X = k\}$ means: in $k$ Bernoulli trials, exactly $r-1$ successes occur in the first $k-1$ trials, then the $k$-th trial is the $r$-th success:

P(X = k) = \underbrace{\binom{k-1}{r-1}}_{\substack{\text{ways to place}\\\text{$r-1$ successes}}} \cdot \underbrace{p^{r-1}(1-p)^{k-r}}_{\substack{\text{first } k-1\\\text{trials}}} \cdot \underbrace{p}_{\substack{k\text{-th}\\\text{trial}}} = \binom{k-1}{r-1}p^r(1-p)^{k-r}

Python

# === Sampling ===
# NumPy: returns # failures before r successes
rng.negative_binomial(r, p, size=1000)

# === SciPy distribution object ===
# nbinom(n, p) = # failures before n successes
dist = scipy.stats.nbinom(r, p)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.ppf(q) # quantile
dist.mean(), dist.var() # r(1-p)/p, r(1-p)/p²

Poisson Discrete

$$X \sim \text{Poisson}(\lambda)$$ $$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots$$

Construction: Limit of Binomial$(n, p)$ as $n \to \infty$, $p \to 0$, $np = \lambda$

Interpretation: Count of rare events in fixed interval

Path: Bernoulli$(p)$ → Binomial$(n,p)$ → $n \to \infty$, $np \to \lambda$ → Poisson$(\lambda)$

The Poisson Limit Theorem (Law of Rare Events)

Starting from Bernoulli: Let $X_n \sim \text{Binomial}(n, p_n)$ (sum of $n$ Bernoullis with $p_n = \lambda/n$).

Step 1: Start with Binomial PMF (from Bernoulli sum)

P(X_n = k) = \binom{n}{k}p_n^k(1-p_n)^{n-k}

Substitute $p_n = \lambda/n$ (rare events: probability vanishes as $n$ grows):

Step 2: Expand and rearrange

P(X_n = k) = \frac{n!}{k!(n-k)!} \cdot \frac{\lambda^k}{n^k} \cdot \left(1 - \frac{\lambda}{n}\right)^{n-k}

Step 3: Take the limit $n \to \infty$

Each factor has a well-defined limit:

\frac{n(n-1)\cdots(n-k+1)}{n^k} \to 1, \quad \left(1 - \frac{\lambda}{n}\right)^{n} \to e^{-\lambda}, \quad \left(1 - \frac{\lambda}{n}\right)^{-k} \to 1

Step 4: Combine to get Poisson PMF

\lim_{n \to \infty} P(X_n = k) = \frac{\lambda^k}{k!} e^{-\lambda} = P(Y = k) \text{ where } Y \sim \text{Poisson}(\lambda)

Alternative: MGF Limit Approach

Start from the Binomial MGF (which comes from Bernoulli MGF):

M_{\text{Bin}(n, \lambda/n)}(t) = \left[(1-p) + pe^t\right]^n = \left[1 + \frac{\lambda(e^t - 1)}{n}\right]^n \xrightarrow{n \to \infty} e^{\lambda(e^t - 1)}

This is exactly the Poisson$(\lambda)$ MGF, confirming convergence in distribution.

Python

# === Sampling ===
rng.poisson(lam, size=1000)

# === SciPy distribution object ===
dist = scipy.stats.poisson(lam)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.sf(k) # P(X > k)
dist.ppf(q) # quantile
dist.mean(), dist.var() # both equal λ

💡 Key Insight: The Counting Hierarchy

Bernoulli counts success in 1 trial → Binomial counts successes in $n$ trials → Multinomial generalizes to $k$ categories → Geometric counts trials to 1st success → Negative Binomial counts trials to $r$-th success → Poisson counts rare events (infinitely many trials with vanishing probability).

🔗 The Hypergeometric Connection

Sampling without replacement modifies the Bernoulli/Binomial relationship.

Hypergeometric Discrete

$$X \sim \text{Hypergeometric}(N, K, n)$$ $$P(X = k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}$$

Construction: $n$ draws without replacement from $N$ items ($K$ successes)

Limit: As $N \to \infty$ with $K/N \to p$: Hypergeometric $\to$ Binomial

Path: $N$ iid Bernoulli$(p)$ → Two Binomials → Condition on total → Hypergeometric

The Key Insight: How Independent Draws Become Dependent

The Hypergeometric arises when we condition independent Bernoulli trials on a constraint. This is how independent draws create dependent distributions—a fundamental idea in probability.

Setup: Two Independent Binomials

Imagine $N$ independent Bernoulli$(p)$ trials, partitioned into two groups:

First $n$ trials: $X \sim \text{Binomial}(n, p)$ = successes in sample
Remaining $N - n$ trials: $Y \sim \text{Binomial}(N-n, p)$ = successes outside sample

Since trials are independent, $X$ and $Y$ are independent Binomials.

Step 1: Total successes is also Binomial

The total number of successes in all $N$ trials:

X + Y \sim \text{Binomial}(N, p)

Now, suppose we observe that exactly $K$ total successes occurred: $X + Y = K$.

Step 2: Condition on total successes

What is the distribution of $X$ (successes in sample) given $X + Y = K$?

P(X = k \mid X + Y = K) = \frac{P(X = k, Y = K - k)}{P(X + Y = K)}

Step 3: Apply independence of X and Y

Since $X \perp Y$, the joint factors:

= \frac{P(X = k) \cdot P(Y = K - k)}{P(X + Y = K)} = \frac{\binom{n}{k}p^k(1-p)^{n-k} \cdot \binom{N-n}{K-k}p^{K-k}(1-p)^{N-n-K+k}}{\binom{N}{K}p^K(1-p)^{N-K}}

Step 4: The p terms cancel!

The $p^k \cdot p^{K-k} = p^K$ terms cancel with the denominator, as do the $(1-p)$ terms:

P(X = k \mid X + Y = K) = \frac{\binom{n}{k}\binom{N-n}{K-k}}{\binom{N}{K}}

This is exactly the Hypergeometric$(N, K, n)$ PMF!

💡 The Deep Insight

The parameter $p$ completely disappears from the conditional distribution. This reveals why Hypergeometric arises in sampling without replacement: conditioning on the total successes $K$ is equivalent to fixing the population composition, which then constrains our sample. Independent Bernoulli trials become dependent when we condition on their sum.

Equivalence to Sampling Without Replacement

Drawing $n$ items from a population of $N$ (where $K$ are "successes") without replacement is probabilistically identical to:

Assigning each item an iid Bernoulli$(p)$ label
Conditioning on exactly $K$ items being labeled "success"
Counting how many of the first $n$ items are labeled "success"

Convergence to Binomial (Large Population Limit)

When $N \to \infty$ with $K/N \to p$ fixed, the conditioning becomes negligible—learning the total $K$ barely constrains any finite sample:

\lim_{N \to \infty} \frac{\binom{n}{k}\binom{N-n}{K-k}}{\binom{N}{K}} = \binom{n}{k}p^k(1-p)^{n-k}

Rule of thumb: Use Binomial approximation when $n < 0.05N$ (sampling fraction under 5%).

Python

# === Sampling ===
# NumPy: ngood=K, nbad=N-K, nsample=n
rng.hypergeometric(K, N-K, n, size=1000)

# === SciPy distribution object ===
# hypergeom(M, n, N) = pop M, success n, draws N
dist = scipy.stats.hypergeom(N, K, n)
dist.rvs(size=1000) # sampling
dist.pmf(k) # P(X = k)
dist.cdf(k) # P(X ≤ k)
dist.ppf(q) # quantile
dist.mean(), dist.var() # nK/N, ...

Continuous Distributions from Discrete

Limits and transformations bridge the discrete-continuous divide

🎲 The Continuous Uniform Foundation

The continuous uniform on $[0, 1]$ is the fundamental building block of all continuous random generation—every other distribution can be generated from it via transformations. The general Uniform$(a, b)$ extends this to any bounded interval.

Continuous Uniform Continuous

$$U \sim \text{Uniform}(a, b)$$ $$f(x) = \frac{1}{b-a}, \quad F(x) = \frac{x - a}{b - a}, \quad x \in [a, b]$$

Standard case: $U \sim \text{Uniform}(0,1)$ is the foundation of all random generation

Linear transformation: If $U \sim \text{Uniform}(0,1)$, then $X = a + (b-a)U \sim \text{Uniform}(a,b)$

Key method: Inverse CDF: $X = F^{-1}(U)$ has CDF $F$

Path: Bernoulli$(0.5)$ → DiscreteUniform$(0,1)$ → $n \to \infty$ points → Uniform$(0,1)$ → Linear transform → Uniform$(a,b)$

Connection from Discrete Uniform / Bernoulli

A fair coin flip is Bernoulli$(0.5)$ = DiscreteUniform$(0,1)$. As we increase the number of equally-likely outcomes and rescale, the continuous uniform emerges.

Limiting construction

Consider $X_n \sim \text{DiscreteUniform}\{0, 1, 2, \ldots, n-1\}$. Then $U_n = X_n / n$ takes values in $\{0, 1/n, 2/n, \ldots, (n-1)/n\}$.

U_n \xrightarrow{d} U \sim \text{Uniform}(0, 1) \quad \text{as } n \to \infty

General Uniform$(a, b)$ via Linear Transform

From standard to general

If $U \sim \text{Uniform}(0,1)$, then $X = a + (b-a)U$ satisfies:

P(X \leq x) = P\left(U \leq \frac{x-a}{b-a}\right) = \frac{x-a}{b-a}, \quad x \in [a, b]

So $X \sim \text{Uniform}(a, b)$. Any bounded uniform arises this way.

Moments

Mean and Variance

\mathbb{E}[X] = \frac{a + b}{2}, \qquad \text{Var}(X) = \frac{(b - a)^2}{12}

For standard $U(0,1)$: Mean $= 1/2$, Variance $= 1/12$.

The Probability Integral Transform

This fundamental theorem connects any continuous distribution back to Uniform$(0,1)$:

\text{If } X \text{ has continuous CDF } F, \text{ then } F(X) \sim \text{Uniform}(0, 1)

The Inverse CDF Method (Universality)

Conversely, we can generate any distribution from Uniform$(0,1)$:

\text{If } U \sim \text{Uniform}(0, 1), \text{ then } X = F^{-1}(U) \text{ has CDF } F

This is the fundamental theorem of random variate generation—the bridge from one uniform to all distributions.

Python

# === Sampling ===
rng.random(size=1000) # U(0,1)
rng.uniform(a, b, size=1000) # U(a,b)

# === SciPy (loc=a, scale=b-a) ===
dist = scipy.stats.uniform(a, b-a)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density = 1/(b-a)
dist.cdf(x) # (x-a)/(b-a)
dist.ppf(q) # a + q(b-a)

# === Python stdlib ===
random.random() # U(0,1)
random.uniform(a, b) # U(a,b)

⬇

🌉 The Central Limit Theorem Bridge

The CLT transforms discrete counts into continuous distributions.

Binomial$(n, p)$

$$S_n = \sum_{i=1}^n X_i, \quad X_i \stackrel{\text{iid}}{\sim} \text{Bernoulli}(p)$$

Mean: $np$, Variance: $np(1-p)$

➜ $n \to \infty$
CLT

Normal$(0, 1)$

$$Z_n = \frac{S_n - np}{\sqrt{np(1-p)}} \xrightarrow{d} N(0,1)$$

The standardized sum converges in distribution

CLT for Bernoulli via MGF

Step 1: Standardize

Let $Z_n = \frac{S_n - np}{\sqrt{np(1-p)}}$ where $S_n \sim \text{Binomial}(n,p)$.

Step 2: MGF of standardized sum

M_{Z_n}(t) = e^{-t\sqrt{np/(1-p)}} \left[(1-p) + pe^{t/\sqrt{np(1-p)}}\right]^n

Step 3: Taylor expansion

Let $s = t/\sqrt{np(1-p)}$. Then $e^s \approx 1 + s + s^2/2 + O(s^3)$.

(1-p) + pe^s \approx 1 + ps + ps^2/2

Step 4: Take logarithm and limit

\ln M_{Z_n}(t) = -t\sqrt{\frac{np}{1-p}} + n\ln\left[1 + ps + \frac{ps^2}{2}\right]

Using $\ln(1+x) \approx x - x^2/2$, this simplifies to $t^2/2$ as $n \to \infty$.

Step 5: Identify limit

M_{Z_n}(t) \to e^{t^2/2} = M_{N(0,1)}(t)

By continuity theorem: $Z_n \xrightarrow{d} N(0,1)$.

🔗 The Normal Distribution: Universal Attractor

The normal distribution is the limit of sums—the hub of continuous probability.

Normal Continuous

$$X \sim N(\mu, \sigma^2)$$ $$f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$

From Bernoulli: CLT limit of standardized binomial sums

MGF: $M(t) = \exp(\mu t + \sigma^2 t^2/2)$

Path: Bernoulli$(p)$ → Binomial$(n,p)$ → Standardize, $n \to \infty$ → Normal$(0,1)$

Primary Derivation: CLT from Bernoulli

The Normal emerges as the limiting distribution of standardized Bernoulli sums.

Step 1: Start with Bernoulli sum

Let $S_n = \sum_{i=1}^n X_i$ where $X_i \stackrel{\text{iid}}{\sim} \text{Bernoulli}(p)$, so $S_n \sim \text{Binomial}(n, p)$.

Step 2: Standardize

Subtract mean, divide by standard deviation:

Z_n = \frac{S_n - np}{\sqrt{np(1-p)}} = \frac{S_n - \mathbb{E}[S_n]}{\sqrt{\text{Var}(S_n)}}

Step 3: Apply Central Limit Theorem

Z_n \xrightarrow{d} N(0,1) \quad \text{as } n \to \infty

Alternative Paths to Normal (all trace back to CLT)

From Poisson (via Binomial limit)

\frac{\text{Poisson}(\lambda) - \lambda}{\sqrt{\lambda}} \xrightarrow{d} N(0,1) \text{ as } \lambda \to \infty

General CLT statement

For any iid $X_i$ with mean $\mu$, variance $\sigma^2 < \infty$:

\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0,1)

The Normal is the universal attractor for standardized sums.

Python

# === Sampling ===
rng.normal(mu, sigma, size=1000)
rng.standard_normal(size=1000) # N(0,1)

# === SciPy distribution object ===
dist = scipy.stats.norm(mu, sigma)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density φ(x)
dist.cdf(x) # Φ(x)
dist.sf(x) # 1 - Φ(x)
dist.ppf(q) # Φ⁻¹(q) quantile
dist.isf(q) # Φ⁻¹(1-q)

# === Python stdlib ===
random.gauss(mu, sigma)
random.normalvariate(mu, sigma)

⬇

🔗 Distributions Derived from Normal

Squares, ratios, and sums of normals generate the inferential distributions.

Chi-Squared Continuous

$$X \sim \chi^2_k$$ $$f(x) = \frac{x^{k/2-1}e^{-x/2}}{2^{k/2}\Gamma(k/2)}, \quad x > 0$$

Construction: Sum of $k$ squared standard normals

From Bernoulli: Binomial → Normal → Chi-squared

Path: Bernoulli$(p)$ → Binomial$(n,p)$ → Normal$(0,1)$ → Square → $\chi^2_k$

Derivation: Sum of Squared Normals (from CLT)

Let $Z_1, \ldots, Z_k \stackrel{\text{iid}}{\sim} N(0,1)$ (each a CLT limit of Bernoulli sums). Define $Q = \sum_{i=1}^k Z_i^2$.

Step 1: MGF of a squared standard normal

For $Z \sim N(0,1)$, compute the MGF of $Z^2$:

M_{Z^2}(t) = \mathbb{E}[e^{tZ^2}] = \int_{-\infty}^{\infty} e^{tz^2} \frac{1}{\sqrt{2\pi}} e^{-z^2/2} dz = \frac{1}{\sqrt{1-2t}}, \quad t < \frac{1}{2}

This is the MGF of Gamma$(1/2, 1/2)$, which is $\chi^2_1$.

Step 2: MGF of sum of $k$ independent squared normals

M_Q(t) = \prod_{i=1}^k M_{Z_i^2}(t) = \left[\frac{1}{\sqrt{1-2t}}\right]^k = (1-2t)^{-k/2}

Step 3: Identify the distribution

This is the MGF of Gamma$(k/2, 1/2) = \chi^2_k$. Thus:

Q = \sum_{i=1}^k Z_i^2 \sim \chi^2_k

Additivity Property

If $X \sim \chi^2_m$ and $Y \sim \chi^2_n$ are independent, then $X + Y \sim \chi^2_{m+n}$.

Python

# === Sampling ===
rng.chisquare(df, size=1000)

# === SciPy distribution object ===
dist = scipy.stats.chi2(df)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(X ≤ x)
dist.ppf(q) # quantile
dist.mean(), dist.var() # df, 2·df

Student's t Continuous

$$T \sim t_\nu$$ $$f(t) = \frac{\Gamma((\nu+1)/2)}{\sqrt{\nu\pi}\,\Gamma(\nu/2)}\left(1 + \frac{t^2}{\nu}\right)^{-(\nu+1)/2}$$

Construction: $T = Z / \sqrt{V/\nu}$ where $Z \sim N(0,1)$, $V \sim \chi^2_\nu$, independent

Path: Bernoulli$(p)$ → Binomial → $N(0,1)$ → $\chi^2_\nu$ → Ratio → $t_\nu$

Ratio Construction (from Normal and Chi-squared)

Let $Z \sim N(0,1)$ (CLT limit of Bernoulli sums) and $V \sim \chi^2_\nu$ (sum of squared normals), independent.

Step 1: Define the ratio

T = \frac{Z}{\sqrt{V/\nu}}

Step 2: Write joint density

Since $Z$ and $V$ are independent:

f_{Z,V}(z,v) = \frac{1}{\sqrt{2\pi}}e^{-z^2/2} \cdot \frac{v^{\nu/2-1}e^{-v/2}}{2^{\nu/2}\Gamma(\nu/2)}

Step 3: Transform and integrate out $V$

Substitute $T = Z/\sqrt{V/\nu}$, $W = V$, with Jacobian $|J| = \sqrt{w/\nu}$. After integrating out $W$:

f_T(t) = \frac{\Gamma((\nu+1)/2)}{\sqrt{\nu\pi}\,\Gamma(\nu/2)}\left(1 + \frac{t^2}{\nu}\right)^{-(\nu+1)/2}

Key Properties

Convergence to Normal: As $\nu \to \infty$, $t_\nu \xrightarrow{d} N(0,1)$ (heavier tails disappear).

Application: Inference with estimated variance (sample mean divided by sample SE).

Python

# === Sampling ===
rng.standard_t(df, size=1000) # t(df)

# === SciPy distribution object ===
dist = scipy.stats.t(df, loc=mu, scale=sigma)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(T ≤ x)
dist.ppf(q) # quantile (t-critical)
dist.interval(0.95) # 95% interval

F Distribution Continuous

$$X \sim F_{d_1, d_2}$$ $$F = \frac{U/d_1}{V/d_2}$$

Construction: Ratio of two independent chi-squareds (scaled)

Connection: $T^2 \sim F_{1, \nu}$ where $T \sim t_\nu$

Path: Bernoulli$(p)$ → Binomial → $N(0,1)$ → $\chi^2_{d_1}, \chi^2_{d_2}$ → Ratio → $F_{d_1,d_2}$

Definition: Ratio of Chi-squareds

Let $U \sim \chi^2_{d_1}$, $V \sim \chi^2_{d_2}$ (both from squared normals, which are CLT limits of Bernoulli sums), independent. Then:

F = \frac{U/d_1}{V/d_2} \sim F_{d_1, d_2}

PDF

f(x) = \frac{\sqrt{\frac{(d_1 x)^{d_1} d_2^{d_2}}{(d_1 x + d_2)^{d_1+d_2}}}}{x\, B(d_1/2, d_2/2)}, \quad x > 0

Key Properties

Reciprocal symmetry

\text{If } X \sim F_{d_1,d_2}, \text{ then } 1/X \sim F_{d_2, d_1}

Connection to $t$-distribution

T^2 \sim F_{1, \nu} \quad \text{where } T \sim t_\nu

Limit to Chi-squared

As $d_2 \to \infty$: $d_1 \cdot F_{d_1, d_2} \xrightarrow{d} \chi^2_{d_1}$

Application: ANOVA F-tests compare ratio of between-group to within-group variance.

Python

# === Sampling ===
rng.f(dfnum, dfden, size=1000)

# === SciPy distribution object ===
dist = scipy.stats.f(dfnum, dfden)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(F ≤ x)
dist.ppf(q) # F-critical value
dist.sf(x) # p-value = 1 - CDF

Cauchy Continuous

$$X \sim \text{Cauchy}(x_0, \gamma)$$ $$f(x) = \frac{1}{\pi\gamma\left[1 + \left(\frac{x - x_0}{\gamma}\right)^2\right]}$$

Construction: $t_1 = $ Student's $t$ with $\nu = 1$ degree of freedom

Key property: No mean or variance (moments do not exist!)

Path: Bernoulli$(p)$ → Binomial → $N(0,1)$ → $\chi^2_1$ → $t_1$ ratio → Cauchy

Cauchy as $t_1$: The Extreme Case

The $t$-distribution with $\nu = 1$ degree of freedom is the Cauchy distribution. Recall that $T = Z/\sqrt{V/\nu}$ where $Z \sim N(0,1)$, $V \sim \chi^2_\nu$.

Step 1: Set $\nu = 1$

With $\nu = 1$: $T = Z/\sqrt{V}$ where $V \sim \chi^2_1 = Z'^2$ for $Z' \sim N(0,1)$.

T = \frac{Z}{|Z'|} \quad \text{(ratio of two independent standard normals)}

Step 2: Substitute into $t_\nu$ PDF

The $t_\nu$ PDF with $\nu = 1$:

f(t) = \frac{\Gamma(1)}{\sqrt{\pi}\,\Gamma(1/2)}\left(1 + t^2\right)^{-1} = \frac{1}{\pi(1 + t^2)}

This is exactly the standard Cauchy (location $x_0 = 0$, scale $\gamma = 1$).

Why Moments Don't Exist

The tails are so heavy that $\int |x| f(x)\,dx = \infty$. Even the mean is undefined! The integral:

\int_{-\infty}^{\infty} \frac{x}{\pi(1 + x^2)}\,dx \quad \text{diverges}

This makes Cauchy a crucial counterexample: CLT does not apply, sample means don't converge to a constant.

Alternative Construction: Ratio of Normals

Direct ratio

If $Z_1, Z_2 \stackrel{\text{iid}}{\sim} N(0,1)$, then $Z_1/Z_2 \sim \text{Cauchy}(0, 1)$.

Stability Property

Cauchy is a stable distribution: if $X_1, X_2 \stackrel{\text{iid}}{\sim} \text{Cauchy}$, then $\frac{X_1 + X_2}{2} \sim \text{Cauchy}$ (not a narrower distribution!). The sample mean has the same distribution as a single observation—averaging doesn't help.

Python

# === Sampling ===
rng.standard_cauchy(size=1000) # Cauchy(0,1)

# === SciPy (loc=x₀, scale=γ) ===
dist = scipy.stats.cauchy(loc=x0, scale=gamma)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(X ≤ x)
dist.ppf(q) # quantile
dist.median() # = x₀ (mean undefined!)

# === Via t-distribution (ν=1) ===
scipy.stats.t(df=1) # equivalent!

⬇

🔗 The Waiting Time Path: Geometric → Exponential → Gamma

Continuous waiting times emerge as limits of discrete counts.

🌉 Geometric to Exponential Limit

Geometric$(p)$

$$P(X > k) = (1-p)^k$$

Waiting time: discrete trials

➜ $p \to 0$
$np \to \lambda$

Exponential$(\lambda)$

$$P(T > t) = e^{-\lambda t}$$

Waiting time: continuous

Discretization Argument

Partition time $[0, t]$ into $n$ intervals of length $\Delta = t/n$. In each interval, success occurs with probability $p = \lambda \Delta$.

Step 1: No success in $[0,t]$

P(\text{no success in } n \text{ intervals}) = (1-p)^n = \left(1 - \frac{\lambda t}{n}\right)^n

Step 2: Take limit

\lim_{n \to \infty} \left(1 - \frac{\lambda t}{n}\right)^n = e^{-\lambda t}

Step 3: Identify

$P(T > t) = e^{-\lambda t}$ is the survival function of Exponential$(\lambda)$.

Memoryless Property Preserved

Both Geometric and Exponential are memoryless—they are the only discrete and continuous distributions with this property.

Gamma Continuous

$$X \sim \text{Gamma}(\alpha, \beta)$$ $$f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}, \quad x > 0$$

Construction: Sum of $\alpha$ iid Exponential$(\beta)$ when $\alpha \in \mathbb{N}$

Special cases: Exp$(\lambda) = $ Gamma$(1, \lambda)$; $\chi^2_k = $ Gamma$(k/2, 1/2)$

Path: Bernoulli$(p)$ → Geometric$(p)$ → $p \to 0$ → Exponential$(\lambda)$ → Sum $\alpha$ copies → Gamma$(\alpha,\lambda)$

Exponential: The $\alpha = 1$ Case

Gamma$(1, \lambda) = $ Exponential$(\lambda)$ with PDF $f(x) = \lambda e^{-\lambda x}$ for $x \geq 0$.

\text{Mean} = 1/\lambda, \quad \text{Var} = 1/\lambda^2, \quad \text{MGF} = \frac{\lambda}{\lambda - t}

This is the continuous analog of Geometric—the waiting time for a Poisson event. Both are memoryless: $P(X > s+t \mid X > s) = P(X > t)$.

Derivation: Sum of Exponentials (from Geometric limit)

Let $X_1, \ldots, X_n \stackrel{\text{iid}}{\sim} \text{Exp}(\lambda)$ (each a limit of Geometric as $p \to 0$) and define $S_n = \sum_{i=1}^n X_i$.

Step 1: Exponential MGF

M_{X_i}(t) = \frac{\lambda}{\lambda - t}, \quad t < \lambda

Step 2: Product of MGFs for sum

M_{S_n}(t) = \prod_{i=1}^n M_{X_i}(t) = \left(\frac{\lambda}{\lambda - t}\right)^n

Step 3: Identify as Gamma

This is exactly the MGF of Gamma$(n, \lambda)$:

S_n = \sum_{i=1}^n X_i \sim \text{Gamma}(n, \lambda)

Chi-squared: The $\beta = 1/2$ Case

Gamma$(k/2, 1/2) = \chi^2_k$ (sum of squared normals, which are CLT limits of Bernoulli sums).

Python

# === Exponential (Gamma with α=1) ===
rng.exponential(scale=1/lam, size=1000)
scipy.stats.expon(scale=1/lam) # full dist object
random.expovariate(lam) # stdlib

# === General Gamma (shape=α, scale=1/β) ===
rng.gamma(shape=alpha, scale=1/beta, size=1000)

# === SciPy (a=shape, scale=1/rate) ===
dist = scipy.stats.gamma(alpha, scale=1/beta)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(X ≤ x)
dist.ppf(q) # quantile
dist.mean(), dist.var() # α/β, α/β²

# === Python stdlib ===
random.gammavariate(alpha, 1/beta)

⬇

🔗 The Beta Distribution: Continuous Analog of Bernoulli

The Bernoulli distribution models a binary outcome $\{0, 1\}$ with probability $p$. The Beta distribution is its continuous analog: it lives on $[0, 1]$ and models the probability parameter $p$ itself. Just as Bernoulli is the atom of discrete probability, Beta is the atom of continuous probability on bounded intervals—and they are intimately connected through Bayesian conjugacy.

Beta Continuous

$$X \sim \text{Beta}(\alpha, \beta)$$ $$f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}, \quad x \in [0,1]$$

Bernoulli parallel: Bernoulli PMF $p^k(1-p)^{1-k}$ ↔ Beta PDF $\propto x^{\alpha-1}(1-x)^{\beta-1}$

Construction: $X/(X+Y)$ where $X \sim \text{Gamma}(\alpha, 1)$, $Y \sim \text{Gamma}(\beta, 1)$ independent

Special case: Beta$(1,1) = $ Uniform$(0,1)$

Path: Bernoulli$(p)$ → Geometric → Exponential → Gamma$(\alpha,1)$, Gamma$(\beta,1)$ → Ratio → Beta$(\alpha,\beta)$

Why Beta is the Continuous Bernoulli

Compare the functional forms:

\text{Bernoulli: } P(X = k) = p^k(1-p)^{1-k}, \quad k \in \{0, 1\}$$ $$\text{Beta: } f(x) \propto x^{\alpha-1}(1-x)^{\beta-1}, \quad x \in [0, 1]

The structural parallel is striking: both have the form $(\text{something})^a \cdot (1 - \text{something})^b$. Bernoulli allocates probability mass to two points; Beta spreads density across the continuum. When $\alpha = \beta = 1$, Beta becomes Uniform—no preference across $[0,1]$—just as Bernoulli$(0.5)$ shows no preference between 0 and 1.

Construction from Gammas (rooted in Bernoulli)

Let $X \sim \text{Gamma}(\alpha, 1)$ and $Y \sim \text{Gamma}(\beta, 1)$ be independent. (Recall: Gamma is a sum of Exponentials, which are limits of Geometrics from Bernoulli.)

Step 1: Joint density of independent Gammas

f_{X,Y}(x,y) = \frac{x^{\alpha-1}e^{-x}}{\Gamma(\alpha)} \cdot \frac{y^{\beta-1}e^{-y}}{\Gamma(\beta)}

Step 2: Transform to proportion and total

Let $U = X/(X+Y)$ (proportion) and $V = X + Y$ (total).

Inverse: $X = UV$, $Y = V(1-U)$. Jacobian: $|J| = V$.

Step 3: Integrate out the total $V$

f_U(u) = \frac{u^{\alpha-1}(1-u)^{\beta-1}}{\Gamma(\alpha)\Gamma(\beta)} \int_0^\infty v^{\alpha+\beta-1}e^{-v}\,dv = \frac{u^{\alpha-1}(1-u)^{\beta-1}}{B(\alpha,\beta)}

The Beta-Bernoulli Conjugacy: The Deepest Connection

Beta is the conjugate prior for the Bernoulli/Binomial likelihood—meaning Beta is the natural "uncertainty distribution" for a Bernoulli probability parameter:

\text{Prior: } p \sim \text{Beta}(\alpha, \beta)$$ $$\text{Likelihood: } X_1, \ldots, X_n \mid p \stackrel{\text{iid}}{\sim} \text{Bernoulli}(p)$$ $$\text{Posterior: } p \mid X_1, \ldots, X_n \sim \text{Beta}(\alpha + \sum X_i, \beta + n - \sum X_i)

The posterior stays in the Beta family! This isn't coincidence—it reflects the deep structural kinship: Beta quantifies uncertainty about the Bernoulli parameter, just as Bernoulli quantifies uncertainty about a binary outcome.

Order Statistics Connection

If $U_1, \ldots, U_n \stackrel{\text{iid}}{\sim} \text{Uniform}(0,1)$, then the $k$-th order statistic $U_{(k)} \sim \text{Beta}(k, n-k+1)$. This provides another bridge: uniform random variables (the foundation of all simulation) naturally generate Beta distributions through ordering.

Python

# === Sampling ===
rng.beta(alpha, beta, size=1000)

# === SciPy distribution object ===
dist = scipy.stats.beta(alpha, beta)
dist.rvs(size=1000) # sampling
dist.pdf(x) # density
dist.cdf(x) # P(X ≤ x)
dist.ppf(q) # quantile
dist.mean(), dist.var() # α/(α+β), ...
dist.interval(0.95) # credible interval

# === Python stdlib ===
random.betavariate(alpha, beta)

📊 Complete Distribution Hierarchy

Distribution	Type	Connection to Bernoulli	Key Transform
Discrete Uniform$(a,b)$	Discrete	Binary expansion of Bernoulli$(0.5)$	$\sum B_i \cdot 2^{i-1}$
Bernoulli$(p)$	Discrete	Foundation (binary)	—
Binomial$(n,p)$	Discrete	Sum of $n$ Bernoullis	$S = \sum_{i=1}^n X_i$
Multinomial$(n, \mathbf{p})$	Discrete	$k$-category generalization of Binomial	Count each of $k$ outcomes in $n$ trials
Geometric$(p)$	Discrete	First success in Bernoulli seq.	$Y = \min\{k: X_k = 1\}$
Negative Binomial$(r,p)$	Discrete	Sum of $r$ Geometrics	$S = \sum_{i=1}^r Y_i$
Poisson$(\lambda)$	Discrete	Binomial limit: $n\to\infty$, $np\to\lambda$	Rare events limit
Uniform$(a,b)$	Continuous	Foundation for random generation	Inverse CDF: $X = F^{-1}(U)$
Normal$(0,1)$	Continuous	CLT limit of standardized Binomial	$(S_n - np)/\sqrt{np(1-p)}$
Exponential$(\lambda)$	Continuous	Geometric limit (continuous time)	$p \to 0$, $np \to \lambda$
Gamma$(\alpha, \beta)$	Continuous	Sum of Exponentials; NegBin limit	$S = \sum \text{Exp}_i$
Chi-squared$_k$	Continuous	Sum of squared Normals	$Q = \sum Z_i^2$
Student's $t_\nu$	Continuous	Normal/Chi-squared ratio	$Z/\sqrt{V/\nu}$
$F_{d_1, d_2}$	Continuous	Chi-squared ratio	$(U/d_1)/(V/d_2)$
Cauchy$(x_0, \gamma)$	Continuous	$t_1$ (Student's $t$ with $\nu=1$)	$Z_1/Z_2$ for iid Normals
Beta$(\alpha, \beta)$	Continuous	Gamma ratio; Bernoulli conjugate	$X/(X+Y)$; continuous analog of Bernoulli
Hypergeometric$(N,K,n)$	Discrete	Conditional Binomial; sampling w/o replacement	$P(X=k \mid X+Y=K)$

💡 The Grand Unified View

Every distribution in classical probability can be traced back to the Bernoulli trial and the Uniform distribution. The Uniform$(0,1)$ is the foundation of all random generation via the inverse CDF method. The key operations connecting distributions are: summation (Binomial, Negative Binomial, Gamma, Chi-squared), generalization (Multinomial extends Binomial to $k$ categories), waiting/counting (Geometric, Exponential), limits (Poisson, Normal, Exponential), ratios (t, F, Beta, Cauchy), and conditioning (Hypergeometric). The CLT is the great bridge between the discrete and continuous worlds, making the Normal distribution the universal attractor for sums. The Beta distribution plays a special role as the continuous analog of Bernoulli—living on $[0,1]$ just as Bernoulli probabilities do, while Dirichlet generalizes Beta to probability vectors just as Multinomial generalizes Binomial.

⚡ Reproducibility & Parallel Random Generation

Proper seeding ensures reproducible results. For parallel computing, use SeedSequence.spawn() to create independent streams—never use sequential seeds like 1, 2, 3.

Modern NumPy Random Generator API

# === Basic Setup (ALWAYS use default_rng) ===
import numpy as np
from numpy.random import default_rng, SeedSequence

# Reproducible single-stream
rng = default_rng(seed=42)
samples = rng.normal(0, 1, size=1000)

Parallel Independent Streams (The Right Way)

# === Create independent generators for parallel workers ===
def create_parallel_generators(master_seed, n_workers):
    """Create n independent generators from one master seed."""
    ss = SeedSequence(master_seed)
    child_seeds = ss.spawn(n_workers)
    return [default_rng(s) for s in child_seeds]

# Create 8 independent generators
generators = create_parallel_generators(42, 8)

# Each worker gets its own generator
for i, rng in enumerate(generators):
    print(f"Worker {i}: {rng.random(3).round(4)}")

Complete Parallel Monte Carlo Example

from concurrent.futures import ProcessPoolExecutor

def pi_worker(child_seed, n_points):
    """Estimate π using independent RNG stream."""
    rng = default_rng(child_seed)
    x, y = rng.random(n_points), rng.random(n_points)
    inside = np.sum(x**2 + y**2 <= 1)
    return inside, n_points

def parallel_pi(master_seed=42, n_workers=4, n_per_worker=1_000_000):
    ss = SeedSequence(master_seed)
    child_seeds = ss.spawn(n_workers)

    with ProcessPoolExecutor(n_workers) as ex:
        results = list(ex.map(pi_worker, child_seeds,
                              [n_per_worker]*n_workers))

    total_in = sum(r[0] for r in results)
    total_n = sum(r[1] for r in results)
    return 4 * total_in / total_n

pi_est = parallel_pi() # Reproducible across runs!

⚠️ Common Mistake: Sequential Seeds

Never create parallel generators with sequential seeds like default_rng(0), default_rng(1), ...—they may have subtle correlations. Always use SeedSequence.spawn() for guaranteed independence.

The Bernoulli Distribution

Discrete Distributions from Bernoulli

🎲 The Uniform Foundation

Discrete Uniform Discrete

🔗 Direct Constructions from Bernoulli

Binomial Discrete

Multinomial Discrete

Geometric Discrete

🔗 Second-Generation Distributions

Negative Binomial Discrete

Poisson Discrete

💡 Key Insight: The Counting Hierarchy

🔗 The Hypergeometric Connection

Hypergeometric Discrete

Continuous Distributions from Discrete

🎲 The Continuous Uniform Foundation

Continuous Uniform Continuous

🌉 The Central Limit Theorem Bridge

Binomial$(n, p)$

Normal$(0, 1)$

🔗 The Normal Distribution: Universal Attractor

Normal Continuous

🔗 Distributions Derived from Normal

Chi-Squared Continuous

Student's t Continuous

F Distribution Continuous

Cauchy Continuous

🔗 The Waiting Time Path: Geometric → Exponential → Gamma

🌉 Geometric to Exponential Limit

Geometric$(p)$

Exponential$(\lambda)$

Gamma Continuous

🔗 The Beta Distribution: Continuous Analog of Bernoulli

Beta Continuous

📊 Complete Distribution Hierarchy

💡 The Grand Unified View

⚡ Reproducibility & Parallel Random Generation

📚 Library Documentation & Resources

NumPy Random

SciPy Statistics

Python Standard Library