Limit Theorems

(1)

Chapter 5 Limit Theorems

(2)

Limit Theorems

• 𝑋₁, ⋯ , 𝑋_𝑛 i.i.d.

𝑀_𝑛 = 𝑋₁ + ⋯ + 𝑋_𝑛 𝑛

What happens as 𝑛 → ∞ ?

• A tool: Chebyshev’s inequality

• Convergence “in probability”

• Convergence of 𝑀_𝑛 (weak law of large numbers)

• The Central Limit Theorem

2

(3)

5.1. Markov Inequality

• If a random variable 𝑋 can only take nonnegative values, then 𝑃 𝑋 ≥ 𝑎 ≤ ^{𝐸 𝑋}_𝑎 , for all 𝑎 > 0

• 𝐸 𝑋 = 𝑥 𝑓₀^𝑎 _𝑋 𝑥 𝑑𝑥 + 𝑥 𝑓_𝑎^∞ _𝑋 𝑥 𝑑𝑥 ≥ 𝑥 𝑓_𝑎^∞ _𝑋 𝑥 𝑑𝑥

≥ 𝑎 𝑓_𝑎^∞ _𝑋 𝑥 𝑑𝑥 = 𝑎 𝑃 𝑋 ≥ 𝑎

(4)

Chebyshev’s Inequality

• If 𝑋 is a random variable with mean 𝜇 and variance 𝜎², then 𝑃 |𝑋 − 𝜇| ≥ 𝑐 ≤ ^𝜎_𝑐₂² , for all 𝑐 > 0.

• 𝜎² = 𝑥 − 𝜇 ²𝑓_𝑋 𝑥 𝑑𝑥

≥ _;∞^𝜇;𝑐 𝑥 − 𝜇 ²𝑓_𝑋 𝑥 𝑑𝑥 + _𝜇:𝑐^∞ 𝑥 − 𝜇 ²𝑓_𝑋 𝑥 𝑑𝑥 ≥ 𝑐² ∙ 𝑃 𝑋 − 𝜇 ≥ 𝑐

𝑃 𝑋 − 𝜇 ≥ 𝑐 ≤ 𝜎² 𝑐² 𝑃 𝑋 − 𝜇 ≥ 𝑘𝜎 ≤ 1

𝑘²

4

(5)

5.2. Convergence of The Sample Mean (The Weak Law of Large Numbers)

• 𝑋₁, 𝑋₂, ⋯ i.i.d.

finite mean 𝜇 and variance 𝜎²

𝑀_𝑛 = 𝑋₁ + ⋯ + 𝑋_𝑛 𝑛

• 𝐸 𝑀_𝑛 = var 𝑀_𝑛 =

• For every 𝜖 > 0,

𝑃 𝑀_𝑛 − 𝜇 ≥ 𝜖 ≤ var 𝑀_𝑛

𝜖² = 𝜎² 𝑛𝜖²

• The Weak Law of Large Numbers:

For every 𝜖 > 0,

𝑋 :⋯:𝑋

(6)

5.3. Convergence of a Deterministic Sequence

• Let 𝑎₁, 𝑎₂, … be a sequence of real numbers, and let 𝑎 be another real number.

• We say that the sequence 𝑎_𝑛 converges to 𝑎, or lim

𝑛→∞ 𝑎_𝑛 = 𝑎, if for every 𝜖 > 0 there exists some 𝑛₀ such that

𝑎_𝑛 − 𝑎 ≤ 𝜖, for all 𝑛 ≥ 𝑛₀.

“𝑎_𝑛 eventually gets and stays (arbitrarily) close to 𝑎”

• The Weak Law of Large Numbers: "𝑀_𝑛 converges to 𝜇 in probability. "

6

(7)

Convergence “in Probability”

• Let 𝑌₁, 𝑌₂, … be a sequence of random variables not necessarily independent), and let

𝑎

be a real number.

• We say that the sequence 𝑌_𝑛 converges to 𝑎 in probability, if for every 𝜖 > 0, we have

𝑛→∞lim 𝑃 𝑌_𝑛 − 𝑎 ≥ 𝜖 = 0.

• The Weak Law of Large Numbers states that the sample mean converges in probability to the true mean 𝜇.

"𝑀_𝑛 converges to 𝜇 in probability. "

• “(almost all) of the PMF/PDF of 𝑌_𝑛 , eventually gets concentrated (arbitrarily) close to 𝑎”

(8)

The Pollster’s Problem

• 𝑝: fraction of population that “...”

• 𝑖th (randomly selected) person polled:

• 𝑋

_𝑖

= 1, if yes, 0, if no.

• 𝑀

_𝑛

=

^𝑋¹^:⋯:𝑋^𝑛

𝑛

fraction of “yes” in our sample

• Goal: 95% confidence of ≤ 1% error

𝑃 𝑀

_𝑛

− 𝑝 ≥ .01 ≤ .05

8

(9)

The Pollster’s Problem

• Goal: 95% confidence of ≤ 1% error

𝑃 𝑀

_𝑛

− 𝑝 ≥ .01 ≤ .05

• Use Chebyshev’s inequality:

𝑃 𝑀

_𝑛

− 𝑝 ≥ .01 ≤

^𝜎^𝑀𝑛²

0.01 ²

=

_{𝑛 0.01}^𝜎^𝑥² ₂

≤

_{4𝑛 0.01}¹ ₂

• If 𝑛 = 50,000, then 𝑃 𝑀_𝑛 − 𝑝 ≥ .01 ≤ .05

(conservative)

(10)

5.4. Different Scalings of 𝑴

_𝒏

• 𝑋₁, ⋯ , 𝑋_𝑛 i.i.d. finite variance 𝜎²

• Look at three variants of their sum:

– 𝑆_𝑛 = 𝑋₁ + ⋯ + 𝑋_𝑛 variance 𝑛𝜎² – 𝑀_𝑛 = ^𝑆_𝑛^𝑛 variance ^𝜎²

converges “in probability” to 𝐸 𝑋 (WLLN) 𝑛

– ^𝑆^𝑛_𝑛 constant variance 𝜎² asymptotic shape?

10

(11)

The Central Limit Theorem

• 𝑋₁, ⋯ , 𝑋_𝑛 i.i.d., finite variance 𝜎²

• “Standardized” 𝑆_𝑛 = 𝑋₁ + ⋯ + 𝑋_𝑛: 𝑍_𝑛 = 𝑆_𝑛 − 𝐸 𝑆_𝑛

𝜎_𝑆_𝑛 = 𝑆_𝑛 − 𝑛𝐸 𝑋 𝑛𝜎

– 𝐸 𝑍_𝑛 = 0, var 𝑍_𝑛 = 1

• Let 𝑍 be a standard normal r.v. (zero mean, unit variance)

• Theorem: For every 𝑐:

𝑃 𝑍_𝑛 ≤ 𝑐 → 𝑃 𝑍 ≤ 𝑐

• 𝑃 𝑍 ≤ 𝑐 is the standard normal CDF, 𝜙 𝑐 , available from the normal tables

(12)

The Central Limit Theorem

• Let 𝑋₁, 𝑋_𝑛,... be a sequence of independent identically distributed random variables with common mean 𝜇 and variance 𝜎², and define

𝑍_𝑛 = 𝑋₁ + ⋯ + 𝑋_𝑛 − 𝑛𝜇 𝜎 𝑛

• Then, the CDF of 𝑍_𝑛 converges to the standard normal CDF Φ 𝑧 = 1

2𝜋 𝑒^;𝑥²^/2

𝑧

;∞

𝑑𝑥 in the sense that

𝑛→∞lim 𝑃 𝑍_𝑛 ≤ 𝑧 = Φ 𝑧 , for every 𝑧.

12

(13)

The Central Limit Theorem

Usefulness

• universal; only means, variances matter

• accurate computational shortcut

• justification of normal models

What exactly does it say?

• CDF of 𝑍_𝑛 converges to normal CDF

– not a statement about convergence of PDFs or PMFs

(14)

The Central Limit Theorem

Normal approximation

• Treat 𝑍_𝑛 as if normal

– also treat 𝑆_𝑛 as if normal

Can we use it when 𝒏 is “moderate”?

• Yes, but no nice theorems to this effect

• Symmetry helps a lot

14

(15)

The Central Limit Theorem

(16)

The Central Limit Theorem

16

(17)

Normal Approximation Based on the Central Limit Theorem

• Let 𝑆_𝑛 = 𝑋₁ + ⋯ + 𝑋_𝑛, where the 𝑋_𝑖 are independent identically distributed random variables with mean 𝜇 and variance 𝜎². If 𝑛 is large, the probability 𝑃(𝑆_𝑛 ≤ 𝑐) can be approximated by

treating 𝑆_𝑛 as if it were normal, according to the following procedure.

1. Calculate the mean 𝑛𝜇 and the variance 𝑛𝜎² of 𝑆_𝑛. 2. Calculate the normalized value 𝑧 = (𝑐 − 𝑛𝜇)/σ 𝑛 3. Use the approximation

𝑃(𝑆_𝑛 ≤ 𝑐) ≈ Φ 𝑧

where Φ 𝑧 is available from standard normal CDF tables.

(18)

The Pollster’s Problem Using the CLT

• 𝑝: fraction of population that “...”

• 𝑖th (randomly selected) person polled:

𝑋_𝑖 = 1, if yes, 0, if no.

• 𝑀_𝑛 = 𝑋₁ + ⋯ + 𝑋_𝑛 /𝑛

• Suppose we want:

𝑃 𝑀_𝑛 − 𝑝 ≥ .01 ≤ .05

18

(19)

The Pollster’s Problem Using the CLT

• Suppose we want:

𝑃 𝑀_𝑛 − 𝑝 ≥ .01 ≤ .05

• Event of interest: 𝑀_𝑛 − 𝑝 ≥ .01

𝑋₁ + ⋯ + 𝑋_𝑛 − 𝑛𝑝

𝑛 ≥ .01

𝑋₁ + ⋯ + 𝑋_𝑛 − 𝑛𝑝

𝑛𝜎 ≥ .01 𝑛

𝜎

𝑃 𝑀_𝑛 − 𝑝 ≥ .01 ≈ 𝑃 𝑍 ≥ .01 𝑛/𝜎 ≤ 𝑃 𝑍 ≥ .02 𝑛

(20)

Apply to Binomial

• Fix 𝑝, where 0 < 𝑝 < 1

• 𝑋_𝑖: Bernoulli(𝑝)

• 𝑆_𝑛 = 𝑋₁ + ⋯ + 𝑋_𝑛: Binomial (𝑛,𝑝)

– mean 𝑛𝑝, variance 𝑛𝑝(1 − 𝑝)

• CDF of ^𝑆^𝑛^;𝑛𝑝

𝑛𝑝 1;𝑝 → standard normal Example

• 𝑛 = 36, 𝑝 = 0.5; find 𝑃(𝑆_𝑛 ≤ 21)

• Exact answer:

36 𝑘

1 2

21 36 𝑘<0

= 0.8785

20

(21)

The ½ Correction for Binomial Approximation

• 𝑃 𝑆_𝑛 ≤ 21 = 𝑃 𝑆_𝑛 < 22 because 𝑆_𝑛 is integer

• Compromise: consider 𝑃 𝑆_𝑛 ≤ 21.5

(22)

De Moivre–Laplace CLT (for Binomial)

• When the 1/2 correction is used, CLT can also approximate the binomial pmf. (not just the binomial CDF)

𝑃 𝑆_𝑛 = 19 = 𝑃 18.5 ≤ 𝑆_𝑛 ≤ 19.5

18.5 ≤ 𝑆_𝑛 ≤ 19.5 ⇔

18.5;18

3 ≤ ^𝑆^𝑛^;18₃ ≤ ^19.5;18₃ ⇔ 0.17 ≤ 𝑍_𝑛 ≤ 0.5

22

(23)

De Moivre–Laplace CLT (for Binomial)

𝑃 𝑆_𝑛 = 19 ≈ 𝑃 0.17 ≤ 𝑍 ≤ 0.5

= 𝑃 𝑍 ≤ 0.5 − 𝑃 𝑍 ≤ 0.17 = 0.6915 − 0.5675

= 0.124

• Exact answer:

36 𝑘

1 2

36

= 0.1251

(24)

De Moivre-Laplace Approximation to the Binomial

• If 𝑆_𝑛 is a binomial random variable with parameters 𝑛 and 𝑝, 𝑛 is large, and 𝑘, 𝑙 are nonnegative integers, then

P 𝑘 ≤ 𝑆_𝑛 ≤ 𝑙 ≈ Φ 𝑙 + 12 − 𝑛𝑝

𝑛𝑝 1 − 𝑝 − Φ 𝑘 − 12 − 𝑛𝑝

𝑛𝑝 1 − 𝑝 .

24

(25)

Poisson vs. Normal Approximations of the Binomial

• Poisson arrivals during unit interval equals:

sum of 𝑛 (independent) Poisson arrivals during 𝑛 intervals of length 1/𝑛

– Let 𝑛 → ∞, apply CLT (??) – Poisson = normal (????)

• Binomial(𝑛, 𝑝)

– 𝑝 fixed, 𝑛 → ∞: normal

– 𝑛𝑝 fixed, 𝑛 → ∞, 𝑝 → 0: Poisson

• 𝑝 = 1/100, 𝑛 = 100: Poisson

• 𝑝 = 1/10, 𝑛 = 500: normal