Chapter 5
Limit Theorems
Limit Theorems
• 𝑋1, ⋯ , 𝑋𝑛 i.i.d.
𝑀𝑛 = 𝑋1 + ⋯ + 𝑋𝑛 𝑛
What happens as 𝑛 → ∞ ?
• A tool: Chebyshev’s inequality
• Convergence “in probability”
• Convergence of 𝑀𝑛 (weak law of large numbers)
• The Central Limit Theorem
2
5.1. Markov Inequality
• If a random variable 𝑋 can only take nonnegative values, then 𝑃 𝑋 ≥ 𝑎 ≤ 𝐸 𝑋𝑎 , for all 𝑎 > 0
• 𝐸 𝑋 = 𝑥 𝑓0𝑎 𝑋 𝑥 𝑑𝑥 + 𝑥 𝑓𝑎∞ 𝑋 𝑥 𝑑𝑥 ≥ 𝑥 𝑓𝑎∞ 𝑋 𝑥 𝑑𝑥
≥ 𝑎 𝑓𝑎∞ 𝑋 𝑥 𝑑𝑥 = 𝑎 𝑃 𝑋 ≥ 𝑎
Chebyshev’s Inequality
• If 𝑋 is a random variable with mean 𝜇 and variance 𝜎2, then 𝑃 |𝑋 − 𝜇| ≥ 𝑐 ≤ 𝜎𝑐22 , for all 𝑐 > 0.
• 𝜎2 = 𝑥 − 𝜇 2𝑓𝑋 𝑥 𝑑𝑥
≥ ;∞𝜇;𝑐 𝑥 − 𝜇 2𝑓𝑋 𝑥 𝑑𝑥 + 𝜇:𝑐∞ 𝑥 − 𝜇 2𝑓𝑋 𝑥 𝑑𝑥 ≥ 𝑐2 ∙ 𝑃 𝑋 − 𝜇 ≥ 𝑐
𝑃 𝑋 − 𝜇 ≥ 𝑐 ≤ 𝜎2 𝑐2 𝑃 𝑋 − 𝜇 ≥ 𝑘𝜎 ≤ 1
𝑘2
4
5.2. Convergence of The Sample Mean (The Weak Law of Large Numbers)
• 𝑋1, 𝑋2, ⋯ i.i.d.
finite mean 𝜇 and variance 𝜎2
𝑀𝑛 = 𝑋1 + ⋯ + 𝑋𝑛 𝑛
• 𝐸 𝑀𝑛 = var 𝑀𝑛 =
• For every 𝜖 > 0,
𝑃 𝑀𝑛 − 𝜇 ≥ 𝜖 ≤ var 𝑀𝑛
𝜖2 = 𝜎2 𝑛𝜖2
• The Weak Law of Large Numbers:
For every 𝜖 > 0,
𝑋 :⋯:𝑋
5.3. Convergence of a Deterministic Sequence
• Let 𝑎1, 𝑎2, … be a sequence of real numbers, and let 𝑎 be another real number.
• We say that the sequence 𝑎𝑛 converges to 𝑎, or lim
𝑛→∞ 𝑎𝑛 = 𝑎, if for every 𝜖 > 0 there exists some 𝑛0 such that
𝑎𝑛 − 𝑎 ≤ 𝜖, for all 𝑛 ≥ 𝑛0.
“𝑎𝑛 eventually gets and stays (arbitrarily) close to 𝑎”
• The Weak Law of Large Numbers: "𝑀𝑛 converges to 𝜇 in probability. "
6
Convergence “in Probability”
• Let 𝑌1, 𝑌2, … be a sequence of random variables not necessarily independent), and let
𝑎
be a real number.• We say that the sequence 𝑌𝑛 converges to 𝑎 in probability, if for every 𝜖 > 0, we have
𝑛→∞lim 𝑃 𝑌𝑛 − 𝑎 ≥ 𝜖 = 0.
• The Weak Law of Large Numbers states that the sample mean converges in probability to the true mean 𝜇.
"𝑀𝑛 converges to 𝜇 in probability. "
• “(almost all) of the PMF/PDF of 𝑌𝑛 , eventually gets concentrated (arbitrarily) close to 𝑎”
The Pollster’s Problem
• 𝑝: fraction of population that “...”
• 𝑖th (randomly selected) person polled:
• 𝑋
𝑖= 1, if yes, 0, if no.
• 𝑀
𝑛=
𝑋1:⋯:𝑋𝑛𝑛
fraction of “yes” in our sample
• Goal: 95% confidence of ≤ 1% error
𝑃 𝑀
𝑛− 𝑝 ≥ .01 ≤ .05
8
The Pollster’s Problem
• Goal: 95% confidence of ≤ 1% error
𝑃 𝑀
𝑛− 𝑝 ≥ .01 ≤ .05
• Use Chebyshev’s inequality:
𝑃 𝑀
𝑛− 𝑝 ≥ .01 ≤
𝜎𝑀𝑛20.01 2
=
𝑛 0.01𝜎𝑥2 2≤
4𝑛 0.011 2• If 𝑛 = 50,000, then 𝑃 𝑀𝑛 − 𝑝 ≥ .01 ≤ .05
(conservative)
5.4. Different Scalings of 𝑴
𝒏• 𝑋1, ⋯ , 𝑋𝑛 i.i.d. finite variance 𝜎2
• Look at three variants of their sum:
– 𝑆𝑛 = 𝑋1 + ⋯ + 𝑋𝑛 variance 𝑛𝜎2 – 𝑀𝑛 = 𝑆𝑛𝑛 variance 𝜎2
converges “in probability” to 𝐸 𝑋 (WLLN) 𝑛
– 𝑆𝑛𝑛 constant variance 𝜎2 asymptotic shape?
10
The Central Limit Theorem
• 𝑋1, ⋯ , 𝑋𝑛 i.i.d., finite variance 𝜎2
• “Standardized” 𝑆𝑛 = 𝑋1 + ⋯ + 𝑋𝑛: 𝑍𝑛 = 𝑆𝑛 − 𝐸 𝑆𝑛
𝜎𝑆𝑛 = 𝑆𝑛 − 𝑛𝐸 𝑋 𝑛𝜎
– 𝐸 𝑍𝑛 = 0, var 𝑍𝑛 = 1
• Let 𝑍 be a standard normal r.v. (zero mean, unit variance)
• Theorem: For every 𝑐:
𝑃 𝑍𝑛 ≤ 𝑐 → 𝑃 𝑍 ≤ 𝑐
• 𝑃 𝑍 ≤ 𝑐 is the standard normal CDF, 𝜙 𝑐 , available from the normal tables
The Central Limit Theorem
• Let 𝑋1, 𝑋𝑛,... be a sequence of independent identically distributed random variables with common mean 𝜇 and variance 𝜎2, and define
𝑍𝑛 = 𝑋1 + ⋯ + 𝑋𝑛 − 𝑛𝜇 𝜎 𝑛
• Then, the CDF of 𝑍𝑛 converges to the standard normal CDF Φ 𝑧 = 1
2𝜋 𝑒;𝑥2/2
𝑧
;∞
𝑑𝑥 in the sense that
𝑛→∞lim 𝑃 𝑍𝑛 ≤ 𝑧 = Φ 𝑧 , for every 𝑧.
12
The Central Limit Theorem
Usefulness
• universal; only means, variances matter
• accurate computational shortcut
• justification of normal models
What exactly does it say?
• CDF of 𝑍𝑛 converges to normal CDF
– not a statement about convergence of PDFs or PMFs
The Central Limit Theorem
Normal approximation
• Treat 𝑍𝑛 as if normal
– also treat 𝑆𝑛 as if normal
Can we use it when 𝒏 is “moderate”?
• Yes, but no nice theorems to this effect
• Symmetry helps a lot
14
The Central Limit Theorem
The Central Limit Theorem
16
Normal Approximation Based on the Central Limit Theorem
• Let 𝑆𝑛 = 𝑋1 + ⋯ + 𝑋𝑛, where the 𝑋𝑖 are independent identically distributed random variables with mean 𝜇 and variance 𝜎2. If 𝑛 is large, the probability 𝑃(𝑆𝑛 ≤ 𝑐) can be approximated by
treating 𝑆𝑛 as if it were normal, according to the following procedure.
1. Calculate the mean 𝑛𝜇 and the variance 𝑛𝜎2 of 𝑆𝑛. 2. Calculate the normalized value 𝑧 = (𝑐 − 𝑛𝜇)/σ 𝑛 3. Use the approximation
𝑃(𝑆𝑛 ≤ 𝑐) ≈ Φ 𝑧
where Φ 𝑧 is available from standard normal CDF tables.
The Pollster’s Problem Using the CLT
• 𝑝: fraction of population that “...”
• 𝑖th (randomly selected) person polled:
𝑋𝑖 = 1, if yes, 0, if no.
• 𝑀𝑛 = 𝑋1 + ⋯ + 𝑋𝑛 /𝑛
• Suppose we want:
𝑃 𝑀𝑛 − 𝑝 ≥ .01 ≤ .05
18
The Pollster’s Problem Using the CLT
• Suppose we want:
𝑃 𝑀𝑛 − 𝑝 ≥ .01 ≤ .05
• Event of interest: 𝑀𝑛 − 𝑝 ≥ .01
𝑋1 + ⋯ + 𝑋𝑛 − 𝑛𝑝
𝑛 ≥ .01
𝑋1 + ⋯ + 𝑋𝑛 − 𝑛𝑝
𝑛𝜎 ≥ .01 𝑛
𝜎
𝑃 𝑀𝑛 − 𝑝 ≥ .01 ≈ 𝑃 𝑍 ≥ .01 𝑛/𝜎 ≤ 𝑃 𝑍 ≥ .02 𝑛
Apply to Binomial
• Fix 𝑝, where 0 < 𝑝 < 1
• 𝑋𝑖: Bernoulli(𝑝)
• 𝑆𝑛 = 𝑋1 + ⋯ + 𝑋𝑛: Binomial (𝑛,𝑝)
– mean 𝑛𝑝, variance 𝑛𝑝(1 − 𝑝)
• CDF of 𝑆𝑛;𝑛𝑝
𝑛𝑝 1;𝑝 → standard normal Example
• 𝑛 = 36, 𝑝 = 0.5; find 𝑃(𝑆𝑛 ≤ 21)
• Exact answer:
36 𝑘
1 2
21 36 𝑘<0
= 0.8785
20
The ½ Correction for Binomial Approximation
• 𝑃 𝑆𝑛 ≤ 21 = 𝑃 𝑆𝑛 < 22 because 𝑆𝑛 is integer
• Compromise: consider 𝑃 𝑆𝑛 ≤ 21.5
De Moivre–Laplace CLT (for Binomial)
• When the 1/2 correction is used, CLT can also approximate the binomial pmf. (not just the binomial CDF)
𝑃 𝑆𝑛 = 19 = 𝑃 18.5 ≤ 𝑆𝑛 ≤ 19.5
18.5 ≤ 𝑆𝑛 ≤ 19.5 ⇔
18.5;18
3 ≤ 𝑆𝑛;183 ≤ 19.5;183 ⇔ 0.17 ≤ 𝑍𝑛 ≤ 0.5
22
De Moivre–Laplace CLT (for Binomial)
𝑃 𝑆𝑛 = 19 ≈ 𝑃 0.17 ≤ 𝑍 ≤ 0.5= 𝑃 𝑍 ≤ 0.5 − 𝑃 𝑍 ≤ 0.17 = 0.6915 − 0.5675
= 0.124
• Exact answer:
36 𝑘
1 2
36
= 0.1251
De Moivre-Laplace Approximation to the Binomial
• If 𝑆𝑛 is a binomial random variable with parameters 𝑛 and 𝑝, 𝑛 is large, and 𝑘, 𝑙 are nonnegative integers, then
P 𝑘 ≤ 𝑆𝑛 ≤ 𝑙 ≈ Φ 𝑙 + 12 − 𝑛𝑝
𝑛𝑝 1 − 𝑝 − Φ 𝑘 − 12 − 𝑛𝑝
𝑛𝑝 1 − 𝑝 .
24
Poisson vs. Normal Approximations of the Binomial
• Poisson arrivals during unit interval equals:
sum of 𝑛 (independent) Poisson arrivals during 𝑛 intervals of length 1/𝑛
– Let 𝑛 → ∞, apply CLT (??) – Poisson = normal (????)
• Binomial(𝑛, 𝑝)
– 𝑝 fixed, 𝑛 → ∞: normal
– 𝑛𝑝 fixed, 𝑛 → ∞, 𝑝 → 0: Poisson
• 𝑝 = 1/100, 𝑛 = 100: Poisson
• 𝑝 = 1/10, 𝑛 = 500: normal