LECTURE NOTE 6 – SUM OF RANDOM VARIABLES For random variables X

(1)

Probability and Random Process, Junhee Seok, Korea University Lecture Note 6 – Sum of Random Variables

1

LECTURE NOTE 6 – SUM OF RANDOM VARIABLES

For random variables X₁, X₂, …, X_n, we will discuss about the properties of their sum ∑𝑛 𝑋𝑖 𝑖=1 .

PDF OF THE SUM OF TWO RANDOM VARAIABLES

When W = X+Y,

𝐹_𝑊(𝑤) = Pr[𝑋 + 𝑌 ≤ 𝑤] = ∫_𝑥=−∞^∞ �∫_𝑦=−∞^𝑤−𝑥 𝑓_𝑋,𝑌(𝑥, 𝑦)𝑑𝑦� 𝑑𝑥 𝑓𝑊(𝑤) =^𝑑𝑑_𝑑𝑤^𝑊^(𝑤)= ∫_𝑥=−∞^∞ �_𝑑𝑤^𝑑 ∫_𝑦=−∞^𝑤−𝑥 𝑓𝑋,𝑌(𝑥, 𝑦)𝑑𝑦� 𝑑𝑥

= ∫_𝑥=−∞^∞ �_𝑑𝑤^𝑑𝑑_𝑑𝑑^𝑑 ∫_𝑦=−∞^𝑑 𝑓𝑋,𝑌(𝑥, 𝑦)𝑑𝑦� 𝑑𝑥 ← 𝑢 = 𝑤 − 𝑥

= ∫ 𝑓_−∞^∞ 𝑋,𝑌(𝑥, 𝑢)𝑑𝑥= ∫ 𝑓_−∞^∞ 𝑋,𝑌(𝑥, 𝑤 − 𝑥)𝑑𝑥 Finally, 𝒇_𝑾(𝒘) = ∫ 𝒇_−∞^∞ 𝑿,𝒀(𝒙, 𝒘 − 𝒙)𝒅𝒙.

Example

Especially, when X and Y are independent,

𝒇𝑿+𝒀(𝒘) = ∫ 𝑓_−∞^∞ 𝑋,𝑌(𝑥, 𝑤 − 𝑥)𝑑𝑥= ∫ 𝑓_−∞^∞ 𝑋(𝑥)𝑓𝑌(𝑤 − 𝑥)𝑑𝑥

= (𝒇𝑿∗ 𝒇𝒀)(𝒘) → 𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜 of 𝑓𝑋 and 𝑓𝑌

Convolution

For two function 𝑓() and 𝑔(), their convolution is defined as (𝑓 ∗ 𝑔)(𝑥) = ∫ 𝑓(𝑢)𝑔(𝑥 − 𝑢)𝑑𝑢_−∞^∞ .

Example

In general, we can derive the pdf of 𝑊 = ∑^𝑛_𝑖=1𝑋_𝑖 by recursively applying the pdf calculation of the sum of two random variables, which means 𝑊𝑛= 𝑋𝑛+ 𝑊𝑛−1. While it is not easy to show, as special cases,

(1) 𝑋𝑖 ~ Poi(𝜆𝑖) and 𝑋𝑖’s are indep.  ∑𝑛 𝑋𝑖

𝑖=1 ~ Poi(∑𝑛 𝜆𝑖 𝑖=1 ) (2) 𝑋𝑖 ~ N(𝜇𝑖, 𝜎_𝑖²) and 𝑋𝑖’s are indep.  ∑𝑛 𝑋𝑖

𝑖=1 ~ N(∑𝑛 𝜇𝑖

𝑖=1 , ∑^𝑛_𝑖=1𝜎_𝑖²)

Note that E[∑^𝑛_𝑖=1𝑋_𝑖] = ∑ E[𝑋^𝑛_𝑖=1 𝑖] and Var[∑^𝑛_𝑖=1𝑋_𝑖] = ∑ Var[𝑋^𝑛_𝑖=1 𝑖]. We will see this in the next section.

https://en.wikipedia.

org/wiki/Convolutio n

(2)

2

EXPECTATION OF THE SUM OF RANDOM VARIABLES

Let 𝑊𝑛= 𝑋1+ 𝑋2+ ⋯ + 𝑋𝑛= ∑𝑛 𝑋𝑖

𝑖=1 . And then,

E[𝑊_𝑛] = E[𝑋1] + E[𝑋2] + ⋯ + E[𝑋𝑛] = ∑ E[𝑋^𝑛_𝑖=1 𝑖] Var[𝑊𝑛] = E[(∑ 𝑋𝑛 𝑖

𝑖=1 − ∑𝑛 𝜇𝑖

𝑖=1 )²] = E[(∑ (𝑋^𝑛_𝑖=1 𝑖− 𝜇𝑖))²]

= E�∑^𝑛𝑖=1∑ (𝑋^𝑛_𝑗=1 𝑖− 𝜇𝑖)�𝑋𝑗− 𝜇𝑗��

= ∑^𝑛_𝑖=1∑^𝑛_𝑗=1Cov�𝑋_𝑖, 𝑋_𝑗�= ∑ Cov�𝑋_𝑖=𝑗 _𝑖, 𝑋_𝑗�+ ∑ Cov�𝑋_𝑖≠𝑗 _𝑖, 𝑋_𝑗�

= ∑^𝑛_𝑖=1Var[𝑋𝑖]+ 2 ∑^𝑛_𝑖=1∑^𝑛_𝑗=𝑖+1Cov�𝑋𝑖, 𝑋𝑗�

If X_i’s are independent, Var[𝑊_𝑛] = ∑ Var[𝑋^𝑛_𝑖=1 𝑖]. If X_i’s are iid, Var[𝑊𝑛] = 𝑛Var[𝑋].

Example

Expectation and Variance of the Mean of iid Random Variables

Let 𝑊𝑛= ∑^𝑛_𝑖=1^𝑋_𝑛^𝑖 when 𝑋𝑖’s are iid random variables such that 𝑋𝑖~𝑋. Then,

E[𝑊_𝑛] = ∑ E �^𝑛_𝑖=1 ^𝑋_𝑛^𝑖�= E[𝑋]

Var[𝑊𝑛] = ∑ Var �^𝑛_𝑖=1 ^𝑋_𝑛^𝑖�= ∑^𝑛_𝑖=1_𝑛¹₂Var[𝑋𝑖]=¹_𝑛Var[𝑋]

The variance of W_n is close to zero when n is large, which means we can estimate the E[X] very accurately through the averaging.

Law of large numbers: for iid X_i’s, lim_𝑛→∞¹_𝑛∑^𝑛_𝑖=1𝑋_𝑖= E[𝑋].

Example

(1) We have an unfair coin of which head probability is p. But, we don’t know the value of p. Now, we want to estimate p by flipping the coin n times. Is each flipping iid? How much accurately can we estimate p when flipping 20 times compared with 10 times?

(2) Two survey companies report the supporting rate for the president. Company A surveys 100 people, and Company B surveys 500 people. Which company will report the more accurate rate and how much more accurate?

(3)

3

CENTRAL LIMIT THEOREM

The CLT makes Gaussian distribution universal, the king of random variables.

𝑋𝑖’s are iid with E[𝑋] = 𝜇 and Var[𝑋] = 𝜎². And then, E[∑𝑛 𝑋𝑖

𝑖=1 ] = 𝑛𝜇 and Var[∑ 𝑋𝑛 𝑖

𝑖=1 ] = 𝑛𝜎². Here, For 𝑍𝑛=^∑^𝑛^𝑖=1_�𝑛𝜎^𝑋^𝑖^−𝑛𝑛₂ , E[𝑍𝑛] = 0 and Var[𝑍𝑛] = 1.

When n is close to infinity,

lim_𝑛→∞𝐹_𝑍_𝑛(𝑧) = Φ(𝑧) or lim

𝑛→∞𝑍_𝑛 ~ 𝑁(0,1) : Central Limit Theorem

The proof is very hard beyond the scope of this course. Actually, every textbook says this comment, lol.

Approximation using the CLT

𝑋𝑖’s are iid with E[𝑋] = 𝜇 and Var[𝑋] = 𝜎². For 𝑊𝑛= ∑𝑛 𝑋𝑖

𝑖=1 = √𝑛𝜎²𝑍𝑛+ 𝑛𝜇, 𝐹𝑊𝑛(𝑤) = Pr�√𝑛𝜎²𝑍𝑛+ 𝑛𝜇 ≤ 𝑤� = 𝐹𝑍𝑛�^{𝑤−𝑛𝑛}_�𝑛𝜎₂� → Φ �^{𝑤−𝑛𝑛}_�𝑛𝜎₂� when n  ∞.

When n is large, we can approximate 𝐹𝑊𝑛(𝑤) with Φ �^{𝑤−𝑛𝑛}_�𝑛𝜎₂�.

Or we can approximate 𝑊𝑛 with a Gaussian random variable, N(𝑛𝜇, 𝑛𝜎²).

Example

(1) Let 𝑊_𝑛= ∑^𝑛_𝑖=1𝑋_𝑖 and 𝑋_𝑖 ~ Bern(p). Then, W_n ~ B(n,p). When n is large enough, how can we approximate B(n,p)?

(2) One million people vote either candidate A or B equally and randomly. What is the probability that A wins by more than 2,000 votes?

Tips for the Approximation for a Discrete Random Variable

For a discrete random variable X with µ and σ²,

Pr[𝑘₁≤ 𝑋 ≤ 𝑘₂] = Pr[𝑘1− 0.5 ≤ 𝑋 < 𝑘₂+ 0.5] ~Φ �^𝑘²^+0.5−𝑛_𝜎 � − Φ �^𝑘¹^{−0.5−𝑛}_𝜎 � rather than

Pr[𝑘1≤ 𝑋 ≤ 𝑘2] ~ Φ �^𝑘²_𝜎^−𝑛� − Φ �^𝑘¹_𝜎^−𝑛�

Example

When X ~ B(20,0.4), Pr[X=8] = Pr[8≤X≤8] = ?

(4)

4

SUMMARY

When W = X+Y, 𝒇𝑾(𝒘) = ∫ 𝒇_−∞^∞ 𝑿,𝒀(𝒙, 𝒘 − 𝒙)𝒅𝒙.

When X and Y are independent, 𝒇_𝑿+𝒀(𝒘) = (𝒇𝑿∗ 𝒇_𝒀)(𝒘) = ∫ 𝒇_−∞^∞ 𝑿(𝒙)𝒇𝒀(𝒘 − 𝒙)𝒅𝒙

When 𝑊_𝑛= 𝑋₁+ 𝑋₂+ ⋯ + 𝑋_𝑛= ∑^𝑛_𝑖=1𝑋_𝑖,

E[𝑊𝑛] = ∑ E[𝑋^𝑛_𝑖=1 𝑖] and Var[𝑊𝑛] = ∑ Var[𝑋^𝑛_𝑖=1 𝑖]+ 2 ∑^𝑛_𝑖=1∑^𝑛_𝑗=𝑖+1Cov�𝑋𝑖, 𝑋𝑗�. When X_i’s are iid, E[𝑊_𝑛] = 𝑛E[𝑋] and Var[𝑊𝑛] = 𝑛Var[𝑋].

Law of large numbers: for iid X_i’s, lim𝑛→∞1 𝑛∑𝑛 𝑋𝑖

𝑖=1 = E[𝑋].

Central Limit Theorem

When 𝑋_𝑖’s are iid with E[𝑋] = 𝜇 and Var[𝑋] = 𝜎²,

𝑛→∞lim

∑𝑛 𝑋𝑖 𝑖=1 − 𝑛𝜇

√𝑛𝜎² ~ 𝑁(0,1) When 𝑋𝑖’s are iid and n is large, we can approximate 𝑊𝑛= ∑𝑛 𝑋𝑖

𝑖=1 by N(𝑛𝜇, 𝑛𝜎²).