Reasoning with Uncertain Reasoning with Uncertain InformationInformation

(1)

Artificial Intelligence Artificial Intelligence

Chapter 19 Chapter 19

Reasoning with Uncertain Reasoning with Uncertain

Information Information

Biointelligence Lab

School of Computer Sci. & Eng.

Seoul National University

(2)

Outline Outline

l

Review of Probability Theory

l

Probabilistic Inference

l

Bayes Networks

l

Patterns of Inference in Bayes Networks

l

Uncertain Evidence

l

D-Separation

l

Probabilistic Inference in Polytrees

l

Review of Probability Theory

l

Probabilistic Inference

l

Bayes Networks

l

Patterns of Inference in Bayes Networks

l

Uncertain Evidence

l

D-Separation

l

Probabilistic Inference in Polytrees

(3)

19.1 Review of Probability Theory (1/4) 19.1 Review of Probability Theory (1/4)

l

Random variables

l

Joint probability

l

Random variables

l

Joint probability

(B (BAT_OK), M (MOVES) , L (LIFTABLE), G (GUAGE))

Joint

Probability

(True, True, True, True) 0.5686

(True, True, True, False) 0.0299

(True, True, False, True) 0.0135

(True, True, False, False) 0.0007

… …

Ex.

(4)

19.1 Review of Probability Theory (2/4) 19.1 Review of Probability Theory (2/4)

l

Marginal probability

l

Conditional probability

¨ Ex. The probability that the battery is charged given Ex.

l

Marginal probability

l

Conditional probability

¨ Ex. The probability that the battery is charged given

(5)

19.1 Review of Probability Theory (3/4) 19.1 Review of Probability Theory (3/4)

Figure 19.1 A Venn Diagram

(6)

19.1 Review of Probability Theory (4/4) 19.1 Review of Probability Theory (4/4)

l

Chain rule

l

Bayes’ rule

l

¨ Abbreviation for l

Chain rule

l

Bayes’ rule

l

¨ Abbreviation for

(7)

19.2 Probabilistic Inference 19.2 Probabilistic Inference

l

The probability some variable V

_i

has value v

_i

given the evidence e =e.

( ) ( )

( )

[ ]

( )

( R) p( R)

p

R p

R Q P p R Q P p R

p

R Q R p

Q p

= Ø Ø

= +

Ø

Ø Ø

+

= Ø Ø

3 . 0 1

. 0 2 . 0

, , )

, ,

| ,

l

The probability some variable V

_i

has value v

_i

given the evidence e =e.

p(P,Q,R) 0.3

p(P,Q,¬R) 0.2

p(P, ¬Q,R) 0.2

p(P, ¬Q,¬R) 0.1

p(¬P,Q,R) 0.05

p(¬P, Q, ¬R) 0.1 p(¬P, ¬Q,R) 0.05 p(¬P, ¬Q,¬R) 0.0

( ) ( )

( )

[ ]

( )

( R) p( R)

p

R p

R Q P p R Q P p R

p

R Q R p

Q p

= Ø Ø

= +

Ø

Ø Ø

+

= Ø Ø

3 . 0 1

. 0 2 . 0

, , )

, ,

| ,

( ) ( )

( )

[ ]

( )

( ^R) ^p( ^R)

p

R p

R Q P p R Q P p R

p

R Q R p

Q p

= Ø Ø

= +

Ø

Ø Ø Ø + Ø

= Ø Ø

Ø

= Ø Ø Ø

1 . 0 0

. 0 1 . 0

, , )

, ,

| ,

( )

( ^| ) (^| ^| ) ¹

75 . 0

|

= Ø

R Q p R Q P

R Q p Q

(8)

Statistical Independence Statistical Independence

l

Conditional independence

l

Mutually conditional independence

l

Unconditional independence

l

Conditional independence

l

Mutually conditional independence

l

Unconditional independence

(9)

19.3 Bayes Networks (1/2) 19.3 Bayes Networks (1/2)

l

Directed, acyclic graph (DAG) whose nodes are labeled by random variables.

l

Characteristics of Bayesian networks

¨ Node V_i is conditionally independent of any subset of nodes that are not descendents of V_i.

l

Prior probability

l

Conditional probability table (CPT)

l

Directed, acyclic graph (DAG) whose nodes are labeled by random variables.

l

Characteristics of Bayesian networks

¨ Node V_i is conditionally independent of any subset of nodes that are not descendents of V_i.

l

Prior probability

l

Conditional probability table (CPT)

( )

Õ

( )

=

k

i

i i

k p V Pa V

V V

V p

1 2

1, ,..., | ( )

(10)

19.3 Bayes Networks (2/2)

(11)

19.4 Patterns of Inference in Bayes 19.4 Patterns of Inference in Bayes Networks (1/3)

Networks (1/3)

l Causal or top-down inference

¨ Ex. The probability that the arm moves given that the block is liftable

( ) ( ) ( )

( ) ( ) ( ) ( )

(

^M ^B ^L

) ( )

^p ^B ^p

(

^M ^B ^L

) (

^p ^B

)

p

L B p L B M

p L

B p L B M p

L B M

p L

B M p L

M p

Ø Ø

+

=

Ø Ø

+

=

Ø +

=

,

| ,

|

| ,

|

| ,

|

| ,

|

l Causal or top-down inference

¨ Ex. The probability that the arm moves given that the block is liftable

( ) ( ) ( )

( ) ( ) ( ) ( )

(

^M ^B ^L

) ( )

^p ^B ^p

(

^M ^B ^L

) (

^p ^B

)

p

L B p L B M

p L

B p L B M p

L B M

p L

B M p L

M p

Ø Ø

+

=

Ø Ø

+

=

Ø +

=

,

| ,

|

| ,

|

| ,

|

| ,

|

(12)

19.4 Patterns of Inference in Bayes 19.4 Patterns of Inference in Bayes Networks (2/3)

Networks (2/3)

l Diagnostic or bottom-up inference

¨ Using an effect (or symptom) to infer a cause

¨ Ex. The probability that the block is not liftable given that the arm does not move.

(

ØM |ØL

)

= 0.9525

p (using a causal reasoning)

l Diagnostic or bottom-up inference

¨ Using an effect (or symptom) to infer a cause

¨ Ex. The probability that the block is not liftable given that the arm does not move.

(

ØM |ØL

)

= 0.9525

p (using a causal reasoning)

( ) ( ) ( )

( ^| ) ⁰^.⁸⁸⁶³²

03665 .

0 7 . 0 0595 . 0

| |

28575 .

0 3 . 0 9525 . 0

| |

= Ø Ø

= ´ Ø

= Ø Ø

= ´ Ø

Ø Ø

= Ø Ø

Ø

M L p

M p M

p M

p

L p L M M p

L p

M p M

p M

p

L p L M M p

L

p (Bayes’ rule)

(13)

19.4 Patterns of Inference in Bayes 19.4 Patterns of Inference in Bayes Networks (3/3)

Networks (3/3)

l Explaining away

¨ ¬B explains ¬M, making ¬L less certain

( ) ( ) ( )

( )

( ) ( ) ( )

( )

( ) ( ) ( )

( )

88632 .

0 030 . 0

, ,

|

,

| ,

|

,

| , ,

|

<<

=

Ø Ø

= Ø

Ø Ø

Ø

= Ø

Ø Ø

Ø

= Ø Ø

Ø Ø

M B p

L p B p L B M p

M B p

L p L B p L B M p

M B p

L p L B M M p

B L

p (Bayes’ rule)

(def. of conditional prob.)

l Explaining away

¨ ¬B explains ¬M, making ¬L less certain

( ) ( ) ( )

( )

( ) ( ) ( )

( )

( ) ( ) ( )

( )

88632 .

0 030 . 0

, ,

|

,

| ,

|

,

| , ,

|

<<

=

Ø Ø

= Ø

Ø Ø

Ø

= Ø

Ø Ø

Ø

= Ø Ø

Ø Ø

M B p

L p B p L B M p

M B p

L p L B p L B M p

M B p

L p L B M M p

B L p

(structure of the Bayes network)

(14)

19.5 Uncertain Evidence 19.5 Uncertain Evidence

l We must be certain about the truth or falsity of the propositions they represent.

¨ Each uncertain evidence node should have a child node, about which we can be certain.

¨ Ex. Suppose the robot is not certain that its arm did not move.

< Introducing M’ : “The arm sensor says that the arm moved”

– We can be certain that that proposition is either true or false.

< p(¬L| ¬B, ¬M’) instead of p(¬L| ¬B, ¬M)

¨ Ex. Suppose we are uncertain about whether or not the battery is charged.

< Introducing G : “Battery guage”

l We must be certain about the truth or falsity of the propositions they represent.

¨ Each uncertain evidence node should have a child node, about which we can be certain.

¨ Ex. Suppose the robot is not certain that its arm did not move.

< Introducing M’ : “The arm sensor says that the arm moved”

– We can be certain that that proposition is either true or false.

< p(¬L| ¬B, ¬M’) instead of p(¬L| ¬B, ¬M)

¨ Ex. Suppose we are uncertain about whether or not the battery is charged.

< Introducing G : “Battery guage”

(15)

19.6 D

19.6 D--Separation (1/3) Separation (1/3)

l e d-separates V_i and V_j if for every undirected path in the Bayes network between V_i and V_j, there is some node, V_b, on the path having one of the following three properties.

¨ V_b is in e, and both arcs on the path lead out of V_b

¨ V_b is in e, and one arc on the path leads in to V_b and one arc leads out.

¨ Neither V_b nor any descendant of V_b is in e, and both arcs on the path lead in to V_b.

l V_b blocks the path given e when any one of these conditions holds for a path.

l Two nodes V_i and V_j are conditionally independent given a set of node e

l e d-separates V_i and V_j if for every undirected path in the Bayes network between V_i and V_j, there is some node, V_b, on the path having one of the following three properties.

¨ V_b is in e, and both arcs on the path lead out of V_b

¨ V_b is in e, and one arc on the path leads in to V_b and one arc leads out.

¨ Neither V_b nor any descendant of V_b is in e, and both arcs on the path lead in to V_b.

l V_b blocks the path given e when any one of these conditions holds for a path.

l Two nodes V_i and V_j are conditionally independent given a set of node e

(16)

19.6 D

19.6 D--Separation (2/3) Separation (2/3)

(17)

19.6 D

19.6 D--Separation (3/3) Separation (3/3)

l

Ex.

¨ I(G, L|B) by rules 1 and 3

<By rule 1, B blocks the (only) path between G and L, given B.

<By rule 3, M also blocks this path given B.

¨ I(G, L)

<By rule 3, M blocks the path between G and L.

¨ I(B, L)

<By rule 3, M blocks the path between B and L.

l

Even using d-separation, probabilistic inference in Bayes networks is, in general, NP-hard.

l

Ex.

¨ I(G, L|B) by rules 1 and 3

<By rule 1, B blocks the (only) path between G and L, given B.

<By rule 3, M also blocks this path given B.

¨ I(G, L)

<By rule 3, M blocks the path between G and L.

¨ I(B, L)

<By rule 3, M blocks the path between B and L.

l

Even using d-separation, probabilistic inference in

Bayes networks is, in general, NP-hard.

(18)

19.7 Probabilistic Inference in 19.7 Probabilistic Inference in Polytrees (1/2)

Polytrees (1/2)

l

Polytree

¨ A DAG for which there is just one path, along arcs in either direction, between any two nodes in the DAG.

(19)

l

A node is above Q

¨ The node is connected to Q only through Q’s parents l

A node is below Q

¨ The node is connected to Q only through Q’s immediate successors.

l

Three types of evidence.

¨ All evidence nodes are above Q.

¨ All evidence nodes are below Q.

¨ There are evidence nodes both above and below Q.

19.7 Probabilistic Inference in 19.7 Probabilistic Inference in Polytrees (2/2)

Polytrees (2/2)

l

A node is above Q

¨ The node is connected to Q only through Q’s parents l

A node is below Q

¨ The node is connected to Q only through Q’s immediate successors.

l

Three types of evidence.

¨ All evidence nodes are above Q.

¨ All evidence nodes are below Q.

¨ There are evidence nodes both above and below Q.

(20)

Evidence Above (1/2) Evidence Above (1/2)

l

Bottom-up recursive algorithm

l

Ex. p(Q|P5, P4)

( ) ( )

( ) ( ) ( )

å å å å å

=

7 , 6

4

| 7 5

| 6 7

, 6

|

4 , 5

| 7 4

, 5

| 6 7

, 6

|

4 , 5

| 7 , 6 7

, 6

|

4 , 5

| 7 , 6 4

, 5 , 7 , 6

|

4 , 5

| 7 , 6 , 4

, 5

|

P P

P P p P P p P P Q p

P P P p P P P p P P Q p

P P P P p P P Q p

P P P P p P P P P Q p

P P P P Q p P

P Q p

(Structure of

The Bayes network) l

Bottom-up recursive algorithm

l

Ex. p(Q|P5, P4)

( ) ( )

( ) ( ) ( )

å å å å å

=

7 , 6

4

| 7 5

| 6 7

, 6

|

4 , 5

| 7 4

, 5

| 6 7

, 6

|

4 , 5

| 7 , 6 7

, 6

|

4 , 5

| 7 , 6 4

, 5 , 7 , 6

|

4 , 5

| 7 , 6 , 4

, 5

|

P P

P P p P P p P P Q p

P P P p P P P p P P Q p

P P P P p P P Q p

P P P P p P P P P Q p

P P P P Q p P

P Q p

(Structure of

The Bayes network) (d-separation)

(d-separation)

(21)

Evidence Above (2/2) Evidence Above (2/2)

l

Calculating p(P7|P4) and p(P6|P5)

l

Calculating p(P5|P1)

¨ Evidence is “below”

¨ Here, we use Bayes’ rule

( ) ( ) ( ) ( ) ( )

( )

å

( ) ( ) ( )

å å

=

2 , 1

3 3

2 5

| 1 2

, 1

| 6 5

| 6

3 4

, 3

| 7 4

| 3 4

, 3

| 7 4

| 7

P P

P p P P p P P P p P

P p

P p P P P p P

P p

l

Calculating p(P7|P4) and p(P6|P5)

l

Calculating p(P5|P1)

¨ Evidence is “below”

¨ Here, we use Bayes’ rule

( ) ( ) ( ) ( ) ( )

( )

å

( ) ( ) ( )

å å

=

2 , 1

3 3

2 5

| 1 2

, 1

| 6 5

| 6

3 4

, 3

| 7 4

| 3 4

, 3

| 7 4

| 7

P P

P p P P p P P P p P

P p

P p P P P p P

P p

( ) ( ) ( )

( )

5 1 1

| 5 5

|

1 p P

P p P P P p

P

p =

(22)

Evidence Below (1/2) Evidence Below (1/2)

l

Top-down recursive algorithm

( ) ( ) ( )

( )

( ) ( )

(

^P ^P ^Q

) (

^p ^P ^P ^Q

) ( )

^p ^Q

kp

Q p Q P

P P

P kp

P P

p

Q p Q P

P P

P P p

P P

P Q p

| 11 ,

14

| 13 ,

12

| 11 ,

14 ,

13 ,

12

11 ,

14 ,

13 ,

12

| 11 ,

14 ,

13 ,

1 12 ,

14 ,

13 ,

12

|

=

l

Top-down recursive algorithm

( ) ( ) ( )

( )

( ) ( )

(

^P ^P ^Q

) (

^p ^P ^P ^Q

) ( )

^p ^Q

kp

Q p Q P

P P

P kp

P P

p

Q p Q P

P P

P P p

P P

P Q p

| 11 ,

14

| 13 ,

12

| 11 ,

14 ,

13 ,

12

11 ,

14 ,

13 ,

12

| 11 ,

14 ,

13 ,

1 12 ,

14 ,

13 ,

12

|

=

( ) ( ) ( )

( ) ( )

å å

=

9

| 9 9

| 13 ,

12

| 9 ,

9

| 13 ,

12

| 13 ,

12

P

Q P

p P P

P p

Q p

p Q P P

P p Q

P P

p

(23)

Evidence Below (2/2) Evidence Below (2/2)

( ) ( ) ( )

å å

=

10 10

| 10 10

| 11 10

| 14

| 10 10

| 11 ,

14

| 11 ,

14

P P

Q P

p P

P p P

P p

Q P

p P

P P

p Q

P P

p

( ) ( ) ( )

( ¹⁵^, ¹⁰^| ¹¹) ( ¹⁵^| ¹⁰^, ¹¹) ( ¹⁰^| ¹¹) ( ¹⁵^| ¹⁰^, ¹¹) ( ¹⁰)

11 11

| 10 , 10 15

, 15

11 11

| 10 , 10 15

, 15

| 11

11 11

, 10

| 15 10

| 15

10

| 15 10

, 15

| 11 10

| 11

1 11

15

P p P P

P p P

P p P P

P p P

P P

p

P p P P

P p P k

P p

P p P P

P P p

P P

p

P p P P

P p P

P p

P P

p P

P P

p P

P p

P P

=

- =

=

å

( )

å

( ) ( )

( ) ( ) ( )

( ¹⁵^, ¹⁰^| ¹¹) ( ¹⁵^| ¹⁰^, ¹¹) ( ¹⁰^| ¹¹) ( ¹⁵^| ¹⁰^, ¹¹) ( ¹⁰)

11 11

| 10 , 10 15

, 15

11 11

| 10 , 10 15

, 15

| 11

11 11

, 10

| 15 10

| 15

10

| 15 10

, 15

| 11 10

| 11

1 11

15

P p P P

P p P

P p P P

P p P

P P

p

P p P P

P p P k

P p

P p P P

P P p

P P

p

P p P P

P p P

P p

P P

p P

P P

p P

P p

P P

=

- =

=

å

(24)

Evidence Above and Below Evidence Above and Below

( ) ( ) ( )

( )

( ) ( )

(

^-

) (

⁺

)

+ +

-

+ -

+ +

- -

+

=

e e

e

e e

e e e

|

| ,

|

| ,

, |

|

2 2

Q p Q p

k

Q p Q

p k

p

Q p Q

Q p p

( ^Q ^| ^{ ^P ⁵ ^, ^P ⁴ ^}, ^{ ^P ¹² ^, ^P ¹³ ^, ^P ¹⁴ ^, ^P ¹¹ ^} )

p

e⁺ e^-

( ) ( ) ( )

( )

( ) ( )

(

^-

) (

⁺

)

+ +

-

+ -

+ +

- -

+

=

e e

e

e e

e e e

|

| ,

|

| ,

, |

|

2 2

Q p Q p

k

Q p Q

p k

p

Q p Q

Q p p

(25)

A Numerical Example (1/2) A Numerical Example (1/2)

(

^Q ^U

)

^kp

(

^U ^Q

) ( )

^p ^Q

p | = |

( ) ( ) ( )

( ) ( ) ( ) ( )

80 . 0 99 . 0 8 . 0 01 . 0 95 . 0

,

| ,

|

,

|

=

´ +

´

=

Ø Ø

+

=

å

R p Q R P

p R

p Q R P p

R p Q R P p Q

P p

R

( ) ( ) ( )

( ) ( ) ( ) ( )

80 . 0 99 . 0 8 . 0 01 . 0 95 . 0

,

| ,

|

,

|

=

´ +

´

=

Ø Ø

+

=

å

R p Q R P

p R

p Q R P p

R p Q R P p Q

P p

R

( ) ( ) ( )

( ) ( ) ( ) ( )

019 . 0 99 . 0 01 . 0 01 . 0 90 . 0

,

| ,

|

,

|

=

´ +

´

=

Ø Ø

Ø +

Ø

=

Ø

=

Ø

å

R p Q R P

p R p Q R P p

R p Q R P p Q

P p

R

(26)

A Numerical Example (2/2) A Numerical Example (2/2)

l Other techniques

( ) ( ) ( )

60 . 0 2 . 0 2 . 0 8 . 0 7 . 0

2 . 0

| 8

. 0

|

=

´ +

´

=

´ Ø

+

´

= p U P p U P

Q U p

( ) ( ) ( )

21 . 0 98 . 0 2 . 0 019 . 0 7 . 0

98 . 0

| 019

. 0

|

=

´ +

´

=

´ Ø

+

´

=

ØQ p U P p U P

U p

l Other techniques

( ) ( ) ( )

21 . 0 98 . 0 2 . 0 019 . 0 7 . 0

98 . 0

| 019

. 0

|

=

´ +

´

=

´ Ø

+

´

=

ØQ p U P p U P

U p

( )

(

|

)

4.35 0.03 0.13 ,

35 . 4

20 . 0 95

. 0 21 . 0

|

03 . 0 05

. 0 6 . 0

|

=

´

=

\

´

=

´

= Ø

´

=

´

=

U Q p k

k k

U Q p

k k

U Q p

(27)

Additional Readings (1/5) Additional Readings (1/5)

l

[Feller 1968]

¨ Probability Theory

l

[Goldszmidt, Morris & Pearl 1990]

¨ Non-monotonic inference through probabilistic method l

[Pearl 1982a, Kim & Pearl 1983]

¨ Message-passing algorithm

l

[Russell & Norvig 1995, pp.447ff]

¨ Polytree methods l

[Feller 1968]

¨ Probability Theory

l

[Goldszmidt, Morris & Pearl 1990]

¨ Non-monotonic inference through probabilistic method l

[Pearl 1982a, Kim & Pearl 1983]

¨ Message-passing algorithm

l

[Russell & Norvig 1995, pp.447ff]

¨ Polytree methods

(28)

Additional Readings (2/5) Additional Readings (2/5)

l

[Shachter & Kenley 1989]

¨ Bayesian network for continuous random variables l

[Wellman 1990]

¨ Qualitative networks l

[Neapolitan 1990]

¨ Probabilistic methods in expert systems l

[Henrion 1990]

¨ Probability inference in Bayesian networks l

[Shachter & Kenley 1989]

¨ Bayesian network for continuous random variables l

[Wellman 1990]

¨ Qualitative networks l

[Neapolitan 1990]

¨ Probabilistic methods in expert systems l

[Henrion 1990]

¨ Probability inference in Bayesian networks

(29)

Additional Readings (3/5) Additional Readings (3/5)

l

[Jensen 1996]

¨ Bayesian networks: HUGIN system l

[Neal 1991]

¨ Relationships between Bayesian networks and neural networks

l

[Hecherman 1991, Heckerman & Nathwani 1992]

¨ PATHFINDER

l

[Pradhan, et al. 1994]

¨ CPCSBN l

[Jensen 1996]

¨ Bayesian networks: HUGIN system l

[Neal 1991]

¨ Relationships between Bayesian networks and neural networks

l

[Hecherman 1991, Heckerman & Nathwani 1992]

¨ PATHFINDER

l

[Pradhan, et al. 1994]

¨ CPCSBN

(30)

Additional Readings (4/5) Additional Readings (4/5)

l

[Shortliffe 1976, Buchanan & Shortliffe 1984]

¨ MYCIN: uses certainty factor l

[Duda, Hart & Nilsson 1987]

¨ PROSPECTOR: uses sufficiency index and necessity index l

[Zadeh 1975, Zadeh 1978, Elkan 1993]

¨ Fuzzy logic and possibility theory l

[Dempster 1968, Shafer 1979]

¨ Dempster-Shafer’s combination rules

l

[Shortliffe 1976, Buchanan & Shortliffe 1984]

¨ MYCIN: uses certainty factor l

[Duda, Hart & Nilsson 1987]

¨ PROSPECTOR: uses sufficiency index and necessity index l

[Zadeh 1975, Zadeh 1978, Elkan 1993]

¨ Fuzzy logic and possibility theory l

[Dempster 1968, Shafer 1979]

¨ Dempster-Shafer’s combination rules

(31)

Additional Readings (5/5) Additional Readings (5/5)

l

[Nilsson 1986]

¨ Probabilistic logic

l

[Tversky & Kahneman 1982]

¨ Human generally loses consistency facing uncertainty l

[Shafer & Pearl 1990]

¨ Papers for uncertain inference l

Proceedings & Journals

¨ Uncertainty in Artificial Intelligence (UAI)

¨ International Journal of Approximate Reasoning l

[Nilsson 1986]

¨ Probabilistic logic

l

[Tversky & Kahneman 1982]

¨ Human generally loses consistency facing uncertainty l

[Shafer & Pearl 1990]

¨ Papers for uncertain inference l

Proceedings & Journals

¨ Uncertainty in Artificial Intelligence (UAI)

¨ International Journal of Approximate Reasoning