• 검색 결과가 없습니다.

Bayesian Network

N/A
N/A
Protected

Academic year: 2022

Share "Bayesian Network"

Copied!
41
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

Bayesian Network

Jin Young Choi

Seoul National University

1

(2)

Outline

 Bayesian Networks

 Applications:

 Traffic Pattern Analysis

 Topic Model (Document Analysis)

 Directed Acyclic Graph

 Conditional Independence

 D-separation

 Bayesian Parameters

 Parameterized Conditional Distributions

 Multinomial, Dirichlet Distribution, Conjugate Prior

 Markov Blanket

(3)

3

Application: Traffic Pattern Analysis

 Surveillance in crowded scenes

(4)

Graphical Inference Model

(5)

Bayesian Networks

 Directed Acyclic Graph (DAG)

(6)

Bayesian Networks

General Factorization

(7)

LDA Model (Topic Modelling)

(8)

LDA Model

𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑: 𝑝(𝑤|𝑧, 𝜃, ∅, 𝛼, 𝛽) 𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟𝑖: 𝑝(𝑧, 𝜃, ∅|𝑤, 𝛼, 𝛽)

Dir(K, α): 𝑝(𝜇1, … , 𝜇𝐾|𝛼1, … , 𝛼𝐾)=Γ(σς 𝑖=1𝐾 (𝛼𝑖))

𝑖=1𝐾 Γ(𝛼𝑖) 𝜇1𝛼1−1 … 𝜇𝐾𝛼𝐾−1

Mul(K, 𝜇): 𝑝(𝑥1, … , 𝑥𝐾|𝜇1, … , 𝜇𝐾)=𝑥 𝑛!

1!…𝑥𝐾!𝜇1𝑥1 … 𝜇𝐾𝑥𝐾

(9)

Inference of LDA Model

 Maximum A posteriori Probability (MAP) given observation 𝑤, 𝛼, 𝛽

 Bayesian Inference (Learning)

Not Convex

Closed-form solution is not available 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑: 𝑝(𝑤|𝑧, 𝜃, ∅, 𝛼, 𝛽) 𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟𝑖: 𝑝(𝑧, 𝜃, ∅|𝑤, 𝛼, 𝛽)

Likelihoods

(10)

Conditional Independence

 𝑎 is independent of 𝑏 given 𝑐

 Equivalently

 Notation

(11)

Conditional Independence: Example 1

𝑈, 𝑉 , and 𝑐 are independent. 𝑎 = 𝑈 + 𝑐, 𝑏 = 𝑉 + 𝑐 ; 𝑎, 𝑏 independent?

(12)

Conditional Independence: Example 1

𝑈, 𝑉 , and 𝑐 are independent. 𝑎 = 𝑈 + 𝑐, 𝑏 = 𝑉 + 𝑐, 𝑐 = 1; 𝑎, 𝑏 independent?

(13)

Conditional Independence: Example 2

𝑝 𝑏, 𝑐 𝑎 = 𝑝 𝑐 𝑎 𝑝 𝑏 𝑎, 𝑐 = 𝑝 𝑐 𝑎 𝑝 𝑏 𝑐

(14)

Conditional Independence: Example 2

𝑝 𝑎 𝑐 = 𝑝 𝑐 𝑎 𝑝(𝑎) 𝑝(𝑐)

(15)

Conditional Independence: Example 3

Note: this is the opposite of Example 1, with 𝑐 unobserved.

𝑎 and 𝑏 are independent Bernoulli rvs. 𝑐 = 𝑎 + 𝑏

(16)

Conditional Independence: Example 3

Note: this is the opposite of Example 1, with c observed.

𝑎 and 𝑏 are independent Bernoulli rvs. 𝑐 = 𝑎 + 𝑏

(17)

“Am I out of fuel?”

B = Battery (0=flat, 1=fully charged) F = Fuel Tank (0=empty, 1=full)

G = Fuel Gauge Reading (0=empty, 1=full) and hence

 F is independent to B

 G is dependent to B and F

(18)

“Am I out of fuel?”

Probability of an empty tank increased by observing 𝐺 = 0.

and hence

(19)

“Am I out of fuel?”

Probability of an empty tank reduced by observing 𝐵 = 0.

This referred to as “explaining away”.  𝐹 is dependent to 𝐵 given 𝐺 and hence

(20)

Exercise

Answer the following questions for the right-hand Bayesian network.

1) When any random variables are not observed, show that 𝑎 and 𝑏 are independent to each other.

2) When 𝑑 is observed, show that 𝑎 and 𝑏 are dependent to each other.

1) Nothing observed

𝑝 𝑎, 𝑏, 𝑐, 𝑑 = [ 𝑝 𝑎 𝑝 𝑏 𝑝 𝑐 𝑎, 𝑏 𝑝 𝑑 𝑐 ]

𝑝 𝑎, 𝑏 = 𝛴𝑐𝛴𝑑𝑝 𝑎, 𝑏, 𝑐, 𝑑 = 𝑝 𝑎 𝑝 𝑏 𝛴𝑐𝛴𝑑 𝑝 𝑐 𝑎, 𝑏 𝑝 𝑑 𝑐

= 𝑝 𝑎 𝑝 𝑏 𝛴𝑐 𝑝 𝑐 𝑎, 𝑏 𝛴𝑑 [ 𝑝 𝑑 𝑐 ]

= [ 𝑝 𝑎 𝑝 𝑏 ] 2) d is observed

𝑝 𝑎, 𝑏 𝑑 = 𝑝 𝑎, 𝑏, 𝑑

𝑝 𝑑 = 𝛴𝑐 𝑝 𝑎, 𝑏, 𝑐, 𝑑 𝑝 𝑑

= 𝛴𝑐 𝑝 𝑎 𝑝 𝑏 𝑝 𝑐 𝑎, 𝑏 𝑝 𝑑 𝑐

𝑝 𝑑 = 𝑝 𝑎 𝑝 𝑏

𝑝 𝑑 𝛴𝑐 𝑝 𝑐 𝑎, 𝑏 𝑝 𝑑 𝑐

= [ 𝑝 𝑎 𝑝 𝑏 𝑝 𝑑 𝑎, 𝑏 ]

𝑝 𝑑 ≠ 𝑝 𝑎|𝑑 𝑝(𝑏|𝑑)

(21)

Exercise

1) Nothing observed

𝑝 𝑎, 𝑏, 𝑐, 𝑑 = [ 𝑝 𝑎 𝑝 𝑏 𝑝 𝑐 𝑎, 𝑏 𝑝 𝑑 𝑐 ]

𝑝 𝑎, 𝑏 = 𝛴𝑐𝛴𝑑𝑝 𝑎, 𝑏, 𝑐, 𝑑 = 𝑝 𝑎 𝑝 𝑏 𝛴𝑐𝛴𝑑 𝑝 𝑐 𝑎, 𝑏 𝑝 𝑑 𝑐

= 𝑝 𝑎 𝑝 𝑏 𝛴𝑐 𝑝 𝑐 𝑎, 𝑏 𝛴𝑑 [ 𝑝 𝑑 𝑐 ]

= [ 𝑝 𝑎 𝑝 𝑏 ] 2) d is observed

𝑝 𝑎, 𝑏 𝑑 = 𝑝 𝑎, 𝑏, 𝑑

𝑝 𝑑 = 𝛴𝑐 𝑝 𝑎, 𝑏, 𝑐, 𝑑 𝑝 𝑑

= 𝛴𝑐 𝑝 𝑎 𝑝 𝑏 𝑝 𝑐 𝑎, 𝑏 𝑝 𝑑 𝑐

𝑝 𝑑 = 𝑝 𝑎 𝑝 𝑏

𝑝 𝑑 𝛴𝑐 𝑝 𝑐 𝑎, 𝑏 𝑝 𝑑 𝑐

= [ 𝑝 𝑎 𝑝 𝑏 𝑝 𝑑 𝑎, 𝑏 ]

𝑝 𝑑 ≠ 𝑝 𝑎|𝑑 𝑝(𝑏|𝑑)

Answer the following questions for the right-hand Bayesian network.

1) When any random variables are not observed, show that 𝑎 and 𝑏 are independent to each other.

2) When 𝑑 is observed, show that 𝑎 and 𝑏 are dependent to each other.

(22)

D-separation: Example

𝐴

𝐴

𝐴 𝐵

𝐵

𝐵 𝐶

𝐶

𝐶

(23)

D-separation

 𝐴, 𝐵, and 𝐶 are non-intersecting subsets of nodes in a directed graph.

 A path from 𝐴 to 𝐵 is blocked if it contains a node such that either

 the arrows on the path meet either head-to-tail or tail- to-tail at the node, and the node is in the set 𝐶, or

 the arrows meet head-to-head at the node, and

neither the node, nor any of its descendants, are in the set 𝐶.

 If all paths from 𝐴 to 𝐵 are blocked, 𝐴 is said to be 𝑑- separated from 𝐵 by 𝐶.

 If 𝐴 is 𝑑-separated from 𝐵 by 𝐶, the joint distribution over

all variables in the graph satisfies .

(24)

Exercise

When 𝐵 is observed in the following Bayesian network, decide whether every path from 𝐷 to 𝐸 is blocked (𝑑-separated) or not and determine the dependency between 𝐷 and 𝐸.

(a) (b) (c) (d) (e)

(25)

Exercise

a. path 1 (D←A→B→E) or (D → B → E) : Since the connection in B is head to tail and B is observed, the path 1 becomes [ blocking ].

b. path 2 (D → C ← E) : Since the connection in C is head to head and C is not observed, the path 2 becomes [ blocking ].

c. path 3 (D ← A→ E) : Since the connection in A is tail to tail and A is not observed, the path 3 becomes [ non-blocking ].

d. path 4 (D → B ← A → E) : Since the connection in B is head to head and B is observed, the path D → B ← A becomes [ non-blocking ] by B. And since the connection in A is tail to tail and A is not observed, the path B ← A → E

becomes [ non-blocking ]. Hence path 4 becomes [ non-blocking ] .

e. Among the above 4 paths, there [exists or does not exist] at least one non- blocking path and thus D and E are [ dependent or independent ] to each other.

(26)

Exercise

a. path 1 (D←A→B→E) or (D → B → E) : Since the connection in B is head to tail and B is observed, the path 1 becomes [ blocking ].

b. path 2 (D → C ← E) : Since the connection in C is head to head and C is not observed, the path 2 becomes [ blocking ].

c. path 3 (D ← A→ E) : Since the connection in A is tail to tail and A is not observed, the path 3 becomes [ non-blocking ].

d. path 4 (D → B ← A → E) : Since the connection in B is head to head and B is observed, the path D → B ← A becomes [ non-blocking ] by B. And since the connection in A is tail to tail and A is not observed, the path B ← A → E

becomes [ non-blocking ]. Hence path 4 becomes [ non-blocking ] .

e. Among the above 4 paths, there [exists or does not exist] at least one non- blocking path and thus D and E are [ dependent or independent ] to each other.

(27)

D-separation: I.I.D. Data

(28)

Discrete Variables, Multinomial

𝑝(𝑥

1

, … , 𝑥

𝐾

|𝜇

1

, … , 𝜇

𝐾

)=

𝑛!

𝑥1!…𝑥𝐾!

𝜇

1𝑥1

… 𝜇

𝐾𝑥𝐾

𝑝(𝑥

11

, … , 𝑥

1𝐾

, 𝑥

21

, … , 𝑥

2𝐾

|𝜇

11

, … , 𝜇

𝐾𝐾

)=

𝑛!

𝑥11!…𝑥1𝐾!

𝑛!

𝑥21!…𝑥2𝐾!

𝜇

11𝑥11𝑥21

… 𝜇

𝐾𝐾𝑥1𝐾𝑥2𝐾

1 2

3 4

5

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

1 2 3 4 5

(29)

Discrete Variables (1), Multinomial

 General joint distribution: 𝐾

2

− 1 parameters

 Independent joint distribution: 2(𝐾 − 1) parameters

𝑝(𝑥

1

, 𝑥

2

)=𝑝(𝑥

1

|𝑥

2

)𝑝(𝑥

2

) 𝐾 − 1 + 𝐾(𝐾 − 1) 𝑝(𝑥

1

, 𝑥

2

)=𝑝(𝑥

1

)𝑝(𝑥

2

)

𝐾 − 1 + 𝐾 − 1

1 2

3 4

5

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

1 2 3 4 5

(30)

Discrete Variables, Dirichlet

 The posterior distributions are in the same family as the prior probability distribution.

𝑝(𝜇|𝑥) ∝ 𝑝 𝑥 𝜇 𝑝(𝜇)

 The prior and posterior are then called conjugate

distributions, and the prior is called a conjugate prior for the likelihood function.

 Dirichlet distribution is a conjugate (prior) distribution to the multinomial distribution.

 Gaussian is a conjugate prior of Gaussian.

1 3

5

0 0.02 0.04 0.06 0.08 0.1

1 2 3 4 5

(31)

Discrete Variables, Dirichlet

 Posteriori: 𝑝(𝜇|𝑥, 𝛼) ∝ 𝑝 𝑥 𝜇 𝑝(𝜇|𝛼)

 Mul(K, 𝜇): 𝑝(𝑥

1

, … , 𝑥

𝐾

|𝜇

1

, … , 𝜇

𝐾

)=

𝑛!

𝑥1!…𝑥𝐾!

𝜇

1𝑥1

… 𝜇

𝐾𝑥𝐾

 Dir(K, α): 𝑝(𝜇

1

, … , 𝜇

𝐾

|𝛼

1

, … , 𝛼

𝐾

)=

Γ(σ𝑖=1𝐾 (𝛼𝑖−1))

ς𝑖=1𝐾 Γ(𝛼𝑖−1)

𝜇

1𝛼1

… 𝜇

𝐾𝛼𝐾

 Parameters: 𝛼

1

, … , 𝛼

𝐾

> 0 (hyper-parameters)

 Support: 𝜇

1

, … , 𝜇

𝐾

∈ (0,1) where σ

𝑖=1𝐾

𝜇

𝑖

= 1

 Dir(K, c + α): 𝑝(𝜇|𝑥, 𝛼) ∝ 𝑝 𝑥 𝜇 𝑝(𝜇|𝛼)

where 𝑐 = (𝑐

1

, … , 𝑐

𝐾

) is number of occurrences

 𝐸[𝜇

𝑘

] =

𝑐𝑘+𝛼𝑘

σ𝑖=1𝐾 (𝑐𝑖+𝛼𝑖)

1 3

5

0 0.02 0.04 0.06 0.08 0.1

1 2 3 4 5

(32)

Discrete Variables (2)

 General joint distribution over 𝑀 variables: 𝐾

𝑀

− 1 parameters

 𝑀 -node Markov chain: 𝐾 − 1 + 𝑀 − 1 𝐾(𝐾 − 1) parameters

𝑝(𝑥

1

, 𝑥

2

)=𝑝 𝑥

1

𝑝 𝑥

2

|𝑥

1

𝑝 𝑥

3

|𝑥

2

…. 𝑝 𝑥

𝑀

|𝑥

𝑀−1

(33)

Discrete Variables: Bayesian Parameters (1)

(34)

Discrete Variables: Bayesian Parameters (2)

Shared prior

(35)

Parameterized Conditional Distributions

If are discrete, K-state variables,

in general has O(K

M

)

parameters because

𝑝 𝑥

1

, … , 𝑥

𝑀

𝑦 = 1 requires 𝐾

𝑀

− 1 parameters.

The parameterized form

requires only M + 1 parameters (actually this can not model a probability distribution).

(36)

The Markov Blanket

Any factor 𝑝(𝑥𝑘|𝑝𝑎𝑘) that does not have any functional dependence

on 𝑥𝑖 can be taken outside the integral over 𝑥𝑖, and will therefore cancel between numerator and denominator.

𝑥𝑖

= ෑ

𝑘∈𝑀𝐵

𝑝(𝑥𝑘|𝑝𝑎𝑘)

(37)

LDA Model (Topic Modelling)

(38)

38

Application: Traffic Pattern Analysis

 Surveillance in crowded scenes

(39)

Hidden Markov Model

𝑝 𝑥1, … , 𝑥𝑇 = 𝑝 𝑥1 𝑝(𝑥2|𝑥1) 𝑝 𝑥3 𝑥2 … … … 𝑝(𝑥𝑇|𝑥𝑇−1)

(40)

40

Variational Auto-encoder (VAE)

Encoder Decoder

x

x

z

𝑳𝒐𝒔𝒔 = −𝒍𝒐𝒈𝑷𝜽 𝒙 𝒛 + 𝑫𝑲𝑳(𝒒𝝋 𝒛 𝒙 ||𝑷𝜽(𝒛))

Reconstruction Loss

𝒑𝜽 𝒙 𝒛 : a multivariate Gaussian (real-valued data) 𝒑𝜽 𝒙 𝒛 : a Bernoulli (binary-valued data)

Variational Inference

(41)

Interim Summary

 Bayesian Networks

 Directed Acyclic Graph

 Conditional Independence

 D-separation

 Bayesian Parameters

 Parameterized Conditional Distributions

 Multinomial, Dirichlet Distribution, Conjugate Prior

 Markov Blanket

 Applications:

 Topic Model (Document Analysis, Traffic Pattern Analysis)

 Hidden Markov Model

 Generative Models

참조

관련 문서

If there are no results in each case, then your query result is identical to the answer query result, and therefore your query is correct (or at least, it is good enough that

3. Not meeting WHO criteria for BCR-ABL1 + CML, PV, PMF, myelodysplastic syndromes, or other myeloid neoplasms 4. Presence of JAK2, MPL, or CALR,

 Settlement estimation based on realistic deformation characteristics measured from stress path tests which duplicate field stress paths and probable deformation modes of

Measurements items between the maxillary second premolar and the first molar: A, The occlusal plane and reference lines which were above 4, 6, 8 mm from cementoenamel junction

Drug-intoxicated patients were predominantly male, with the highest incidence of intoxication among patients above the seventh decade of age(21.5%) and the

Partial Least Squares or Projection to Latent Structures [PLS].1. LU decomposition or other methods

 Figuring out the dynamics involved in highway operations including tradeoffs among toll charges, service level,. volume of traffic,

This study tried to examine what mutual relationships there are among preferred colors, temperament types and aptitude types, using color and color psychology, the