Probability and Random Process, Junhee Seok, Korea University Lecture Note 6 – Sum of Random Variables
1
LECTURE NOTE 6 – SUM OF RANDOM VARIABLES
For random variables X1, X2, …, Xn, we will discuss about the properties of their sum ∑𝑛 𝑋𝑖 𝑖=1 .
PDF OF THE SUM OF TWO RANDOM VARAIABLES
When W = X+Y,
𝐹𝑊(𝑤) = Pr[𝑋 + 𝑌 ≤ 𝑤] = ∫𝑥=−∞∞ �∫𝑦=−∞𝑤−𝑥 𝑓𝑋,𝑌(𝑥, 𝑦)𝑑𝑦� 𝑑𝑥 𝑓𝑊(𝑤) =𝑑𝑑𝑑𝑤𝑊(𝑤)= ∫𝑥=−∞∞ �𝑑𝑤𝑑 ∫𝑦=−∞𝑤−𝑥 𝑓𝑋,𝑌(𝑥, 𝑦)𝑑𝑦� 𝑑𝑥
= ∫𝑥=−∞∞ �𝑑𝑤𝑑𝑑𝑑𝑑𝑑 ∫𝑦=−∞𝑑 𝑓𝑋,𝑌(𝑥, 𝑦)𝑑𝑦� 𝑑𝑥 ← 𝑢 = 𝑤 − 𝑥
= ∫ 𝑓−∞∞ 𝑋,𝑌(𝑥, 𝑢)𝑑𝑥= ∫ 𝑓−∞∞ 𝑋,𝑌(𝑥, 𝑤 − 𝑥)𝑑𝑥 Finally, 𝒇𝑾(𝒘) = ∫ 𝒇−∞∞ 𝑿,𝒀(𝒙, 𝒘 − 𝒙)𝒅𝒙.
Example
Especially, when X and Y are independent,
𝒇𝑿+𝒀(𝒘) = ∫ 𝑓−∞∞ 𝑋,𝑌(𝑥, 𝑤 − 𝑥)𝑑𝑥= ∫ 𝑓−∞∞ 𝑋(𝑥)𝑓𝑌(𝑤 − 𝑥)𝑑𝑥
= (𝒇𝑿∗ 𝒇𝒀)(𝒘) → 𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜 of 𝑓𝑋 and 𝑓𝑌
Convolution
For two function 𝑓() and 𝑔(), their convolution is defined as (𝑓 ∗ 𝑔)(𝑥) = ∫ 𝑓(𝑢)𝑔(𝑥 − 𝑢)𝑑𝑢−∞∞ .
Example
In general, we can derive the pdf of 𝑊 = ∑𝑛𝑖=1𝑋𝑖 by recursively applying the pdf calculation of the sum of two random variables, which means 𝑊𝑛= 𝑋𝑛+ 𝑊𝑛−1. While it is not easy to show, as special cases,
(1) 𝑋𝑖 ~ Poi(𝜆𝑖) and 𝑋𝑖’s are indep. ∑𝑛 𝑋𝑖
𝑖=1 ~ Poi(∑𝑛 𝜆𝑖 𝑖=1 ) (2) 𝑋𝑖 ~ N(𝜇𝑖, 𝜎𝑖2) and 𝑋𝑖’s are indep. ∑𝑛 𝑋𝑖
𝑖=1 ~ N(∑𝑛 𝜇𝑖
𝑖=1 , ∑𝑛𝑖=1𝜎𝑖2)
Note that E[∑𝑛𝑖=1𝑋𝑖] = ∑ E[𝑋𝑛𝑖=1 𝑖] and Var[∑𝑛𝑖=1𝑋𝑖] = ∑ Var[𝑋𝑛𝑖=1 𝑖]. We will see this in the next section.
https://en.wikipedia.
org/wiki/Convolutio n
Probability and Random Process, Junhee Seok, Korea University Lecture Note 6 – Sum of Random Variables
2
EXPECTATION OF THE SUM OF RANDOM VARIABLES
Let 𝑊𝑛= 𝑋1+ 𝑋2+ ⋯ + 𝑋𝑛= ∑𝑛 𝑋𝑖
𝑖=1 . And then,
E[𝑊𝑛] = E[𝑋1] + E[𝑋2] + ⋯ + E[𝑋𝑛] = ∑ E[𝑋𝑛𝑖=1 𝑖] Var[𝑊𝑛] = E[(∑ 𝑋𝑛 𝑖
𝑖=1 − ∑𝑛 𝜇𝑖
𝑖=1 )2] = E[(∑ (𝑋𝑛𝑖=1 𝑖− 𝜇𝑖))2]
= E�∑𝑛𝑖=1∑ (𝑋𝑛𝑗=1 𝑖− 𝜇𝑖)�𝑋𝑗− 𝜇𝑗��
= ∑𝑛𝑖=1∑𝑛𝑗=1Cov�𝑋𝑖, 𝑋𝑗�= ∑ Cov�𝑋𝑖=𝑗 𝑖, 𝑋𝑗�+ ∑ Cov�𝑋𝑖≠𝑗 𝑖, 𝑋𝑗�
= ∑𝑛𝑖=1Var[𝑋𝑖]+ 2 ∑𝑛𝑖=1∑𝑛𝑗=𝑖+1Cov�𝑋𝑖, 𝑋𝑗�
If Xi’s are independent, Var[𝑊𝑛] = ∑ Var[𝑋𝑛𝑖=1 𝑖]. If Xi’s are iid, Var[𝑊𝑛] = 𝑛Var[𝑋].
Example
Expectation and Variance of the Mean of iid Random Variables
Let 𝑊𝑛= ∑𝑛𝑖=1𝑋𝑛𝑖 when 𝑋𝑖’s are iid random variables such that 𝑋𝑖~𝑋. Then,
E[𝑊𝑛] = ∑ E �𝑛𝑖=1 𝑋𝑛𝑖�= E[𝑋]
Var[𝑊𝑛] = ∑ Var �𝑛𝑖=1 𝑋𝑛𝑖�= ∑𝑛𝑖=1𝑛12Var[𝑋𝑖]=1𝑛Var[𝑋]
The variance of Wn is close to zero when n is large, which means we can estimate the E[X] very accurately through the averaging.
Law of large numbers: for iid Xi’s, lim𝑛→∞1𝑛∑𝑛𝑖=1𝑋𝑖= E[𝑋].
Example
(1) We have an unfair coin of which head probability is p. But, we don’t know the value of p. Now, we want to estimate p by flipping the coin n times. Is each flipping iid? How much accurately can we estimate p when flipping 20 times compared with 10 times?
(2) Two survey companies report the supporting rate for the president. Company A surveys 100 people, and Company B surveys 500 people. Which company will report the more accurate rate and how much more accurate?
Probability and Random Process, Junhee Seok, Korea University Lecture Note 6 – Sum of Random Variables
3
CENTRAL LIMIT THEOREM
The CLT makes Gaussian distribution universal, the king of random variables.
𝑋𝑖’s are iid with E[𝑋] = 𝜇 and Var[𝑋] = 𝜎2. And then, E[∑𝑛 𝑋𝑖
𝑖=1 ] = 𝑛𝜇 and Var[∑ 𝑋𝑛 𝑖
𝑖=1 ] = 𝑛𝜎2. Here, For 𝑍𝑛=∑𝑛𝑖=1�𝑛𝜎𝑋𝑖−𝑛𝑛2 , E[𝑍𝑛] = 0 and Var[𝑍𝑛] = 1.
When n is close to infinity,
lim𝑛→∞𝐹𝑍𝑛(𝑧) = Φ(𝑧) or lim
𝑛→∞𝑍𝑛 ~ 𝑁(0,1) : Central Limit Theorem
The proof is very hard beyond the scope of this course. Actually, every textbook says this comment, lol.
Approximation using the CLT
𝑋𝑖’s are iid with E[𝑋] = 𝜇 and Var[𝑋] = 𝜎2. For 𝑊𝑛= ∑𝑛 𝑋𝑖
𝑖=1 = √𝑛𝜎2𝑍𝑛+ 𝑛𝜇, 𝐹𝑊𝑛(𝑤) = Pr�√𝑛𝜎2𝑍𝑛+ 𝑛𝜇 ≤ 𝑤� = 𝐹𝑍𝑛�𝑤−𝑛𝑛�𝑛𝜎2� → Φ �𝑤−𝑛𝑛�𝑛𝜎2� when n ∞.
When n is large, we can approximate 𝐹𝑊𝑛(𝑤) with Φ �𝑤−𝑛𝑛�𝑛𝜎2�.
Or we can approximate 𝑊𝑛 with a Gaussian random variable, N(𝑛𝜇, 𝑛𝜎2).
Example
(1) Let 𝑊𝑛= ∑𝑛𝑖=1𝑋𝑖 and 𝑋𝑖 ~ Bern(p). Then, Wn ~ B(n,p). When n is large enough, how can we approximate B(n,p)?
(2) One million people vote either candidate A or B equally and randomly. What is the probability that A wins by more than 2,000 votes?
Tips for the Approximation for a Discrete Random Variable
For a discrete random variable X with µ and σ2,
Pr[𝑘1≤ 𝑋 ≤ 𝑘2] = Pr[𝑘1− 0.5 ≤ 𝑋 < 𝑘2+ 0.5] ~Φ �𝑘2+0.5−𝑛𝜎 � − Φ �𝑘1−0.5−𝑛𝜎 � rather than
Pr[𝑘1≤ 𝑋 ≤ 𝑘2] ~ Φ �𝑘2𝜎−𝑛� − Φ �𝑘1𝜎−𝑛�
Example
When X ~ B(20,0.4), Pr[X=8] = Pr[8≤X≤8] = ?
Probability and Random Process, Junhee Seok, Korea University Lecture Note 6 – Sum of Random Variables
4
SUMMARY
When W = X+Y, 𝒇𝑾(𝒘) = ∫ 𝒇−∞∞ 𝑿,𝒀(𝒙, 𝒘 − 𝒙)𝒅𝒙.
When X and Y are independent, 𝒇𝑿+𝒀(𝒘) = (𝒇𝑿∗ 𝒇𝒀)(𝒘) = ∫ 𝒇−∞∞ 𝑿(𝒙)𝒇𝒀(𝒘 − 𝒙)𝒅𝒙
When 𝑊𝑛= 𝑋1+ 𝑋2+ ⋯ + 𝑋𝑛= ∑𝑛𝑖=1𝑋𝑖,
E[𝑊𝑛] = ∑ E[𝑋𝑛𝑖=1 𝑖] and Var[𝑊𝑛] = ∑ Var[𝑋𝑛𝑖=1 𝑖]+ 2 ∑𝑛𝑖=1∑𝑛𝑗=𝑖+1Cov�𝑋𝑖, 𝑋𝑗�. When Xi’s are iid, E[𝑊𝑛] = 𝑛E[𝑋] and Var[𝑊𝑛] = 𝑛Var[𝑋].
Law of large numbers: for iid Xi’s, lim𝑛→∞1 𝑛∑𝑛 𝑋𝑖
𝑖=1 = E[𝑋].
Central Limit Theorem
When 𝑋𝑖’s are iid with E[𝑋] = 𝜇 and Var[𝑋] = 𝜎2,
𝑛→∞lim
∑𝑛 𝑋𝑖 𝑖=1 − 𝑛𝜇
√𝑛𝜎2 ~ 𝑁(0,1) When 𝑋𝑖’s are iid and n is large, we can approximate 𝑊𝑛= ∑𝑛 𝑋𝑖
𝑖=1 by N(𝑛𝜇, 𝑛𝜎2).