Slides 📊

7.4. Understanding Binomial and Poisson Distributions through CLT

Sections 5.6 and 5.7 showed that for certain parameter values, the binomial and Poisson distributions have a pmf with a bell-shaped trend. In this section, we use the CLT to explain why this similarity occurs and explicitly characterize the set of parameters for which this happens. We also identify the issue that can arise from approximating a discrete distribution with a continuous normal distribution.

Road Map 🧭

Use CLT to explain why certain binomial and Poisson distributions can be approximated using normal distributions.
Understand the issue that can arise from the support difference between the true (discrete) and approximated (continuous) distributions. Know that a technique called continuity correction can be used as a remedy.

7.4.1. The Preliminary: An Alternative Statement for the CLT

For an iid sample \(X_1, X_2, \cdots, X_n\) from a population with finite mean \(\mu\) and finite standard deviation \(\sigma\), let \(S_n = X_1 + X_2 + \ldots + X_n\). Then,

\[\frac{S_n - n\mu}{\sigma\sqrt{n}} \xrightarrow{d} N(0,1) \text{ as } n \rightarrow \infty\]

How is this connected to the original statement of the CLT? 🔎

The fraction at the beginning of the mathematical statement is in fact identical to the one used in Section 7.3:

\[\frac{S_n - n\mu}{\sigma\sqrt{n}} = \frac{(S_n - n\mu)/n}{(\sigma\sqrt{n})/n} = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}.\]

By using this alternative expression, we can also view the sample sum as approximately normally distributed:

\[S_n \stackrel{\text{approx}}{\sim} N(n\mu, \sigma\sqrt{n})\]

7.4.2. Binomial Distribution and the CLT

A binomial random variable \(X \sim B(n,p)\) counts the number of successes in \(n\) independent trials, each with probability of success \(p\). Recall that it can also be expressed as:

\[X = \sum_{i=1}^n X_i,\]

where each \(X_i\) is an independent Bernoulli random variable that equals 1 with probability \(p\) and 0 with probability \((1-p)\). Since a binomial random variable is a sum of independent and identically distributed random variables, the CLT applies as \(n\) increases. For a sufficiently large \(n\), the distribution of \(X\) can be approximated by:

\[X \stackrel{\text{approx}}{\sim} N \left(\mu_X = np, \sigma_X = \sqrt{np(1-p)}\right)\]

The two normal parameters are obtained simply by taking \(E[X]\) and \(\sigma_X\) from the true distribution of \(X\).

When does it apply?

Recall that a “large enough” \(n\) for the CLT depends on the skewness of the population distribution. For binomial distributions, the skewness is determined by \(p\) (symmetric for \(p=0.5\), stronger skewness as \(p\) nears \(0\) or \(1\)). Therefore, we usually consider the two parameters \(n\) and \(p\) jointly to identify cases where the binomial distribution is well-approximated by a normal distribution. The rule of thumb is:

Both \(np ≥ 10\) and \(n(1-p) ≥ 10\).
Alternatively, \(np(1-p) ≥ 10\).

The figure below shows some concrete examples:

Cases which work well with normal approximation vs cases which don't — Fig. 7.4 Cases with different compatibilities with normal approximation

In Figure Fig. 7.4,

(a) is symmetric, but \(n\) is too small. It fails the rule-of-thumb tests. Normal approximation will not work well. ❌

(b) is symmetric with sufficiently large \(n\). It passes the rule-of-thumb tests. Normal approximation will work well. ✔

(c) has the same \(n\) as (b), but the distribution is very skewed because \(p=0.1\). Normal approximation will not work well. ❌

(d) has even larger \(n\) which compensates for the \(p\) far from 0.5. Normal approximation will work well.✔

7.4.3. Poisson Distribution and the CLT

A Poisson random variable counts the number of independent events occurring in a fixed interval, where events happen at a constant average rate \(\lambda\).

An interesting property of the Poisson distribution is that the sum of independent Poisson random variables is also Poisson distributed. If \(Y_1 \sim \text{Poisson}(\lambda_1)\) and \(Y_2 \sim \text{Poisson}(\lambda_2)\) are independent, then \(Y_1 + Y_2 \sim \text{Poisson}(\lambda_1 + \lambda_2)\).

By extension, if \(Y_1, Y_2, \cdots, Y_n\) are independent Poisson random variables with an identical rate parameter \(\lambda\), then:

\[\sum_{i=1}^n Y_i \sim \text{Poisson}(n\lambda).\]

Let \(X = \sum_{i=1}^n Y_i\) and \(\tilde{\lambda} = n\lambda\). Since \(Y_i\)’s are iid, the CLT applies for a sufficiently large \(n\):

\[X \stackrel{\text{approx}}{\sim} N\left(\mu_X = \tilde{\lambda}, \sigma_X = \sqrt{\tilde{\lambda}}\right)\]

Again, the normal parameters come from \(E[X]\) and \(\sigma_X\) of the true distribution.

When does it apply?

In practice, we do not have an explicit \(n\) for a Poisson random variable. If \(X \sim Pois(\lambda)\), it can be expressed as a sum of two Poisson random variables, each with parameter \(\lambda/2\), or of a thousand, each with parameter \(\lambda/1000\).

Thus we focus solely on the size of \(\lambda\). Typically, \(\lambda \geq 10\) is large enough for approximation by a normal distribution. See Fig. 7.5 to verify that the Poisson PMF grows more bell-shaped as \(\lambda\) becomes larger.

Poission PMF approaches bell-shape as lambda grows — Fig. 7.5 \(\lambda =1,5,10,50\) from top to bottom, respectively.

7.4.4. The Practicality of Normal Approximation to Binomial & Poisson Distributions

Suppose a random variable \(X\) has a distribution \(B(n=100, p=0.5)\) and we are to compute \(P(X < 50)\). Without access to a computational software, we can either

compute \(P(X < 50)\) directly, which requires computation of 50 separate pmf terms: \(P(X=0) + P(X=1) + \cdots + P(X=49)\), or
use the approximate normal distribution to compute a slightly less accurate value in one access to the standard normal table.

As \(n\) gets larger, both the convenience and accuracy of Option 2 increase. This example only touches on the binomial case, but a similar logic can be applied to a Poisson case with a large \(\lambda\).

This techinque was especially relevant when computational software was less accessible. Today, it still plays an important role in illustrating the broad implications of the CLT and in showing how different distributions are connected.

7.4.5. Continuity Correction

When using a normal distribution to approximate discrete distributions like the binomial or Poisson, we need to account for the difference between their supports. Consider \(X \sim B(n=100, p=0.5)\) again. Using the exact distribution,

\[P(X=48) = {100 \choose 48} \left(\frac{1}{2}\right)^{48}\left(\frac{1}{2}\right)^{100-48} = 0.0735.\]

But using its approximated distribution \(N(\mu=50, \sigma = \sqrt{25})\),

\[P(X = 48) \approx 0.\]

A discrete distribution always has a positive probability for a value in its support, while a normal distribution assigns a zero probability to any single value. This difference needs to be addressed by a technique called continuity correction, which is not covered in detail in this course. You are encouraged to read about it independently.

7.4.6. Bringing It All Together

Key Takeaways 📝

The CLT can be used to describe certain binomial and Poisson distributions.
For binomial distributions, the normal approximation works well when \(np \geq 10\) and \(n(1-p) \geq 10\) (or alternatively when \(np(1-p) \geq 10\)).
For Poisson distributions, the normal approximation works well when \(\lambda \geq 10\).
Continuity corrections may be needed when using a continuous distribution to approximate a discrete one.