Artificial Intelligence Artificial Intelligence
Chapter 19 Chapter 19
Reasoning with Uncertain Reasoning with Uncertain
Information Information
Biointelligence Lab
School of Computer Sci. & Eng.
Seoul National University
Outline Outline
l
Review of Probability Theory
l
Probabilistic Inference
l
Bayes Networks
l
Patterns of Inference in Bayes Networks
l
Uncertain Evidence
l
D-Separation
l
Probabilistic Inference in Polytrees
l
Review of Probability Theory
l
Probabilistic Inference
l
Bayes Networks
l
Patterns of Inference in Bayes Networks
l
Uncertain Evidence
l
D-Separation
l
Probabilistic Inference in Polytrees
19.1 Review of Probability Theory (1/4) 19.1 Review of Probability Theory (1/4)
l
Random variables
l
Joint probability
l
Random variables
l
Joint probability
(B (BAT_OK), M (MOVES) , L (LIFTABLE), G (GUAGE))
Joint
Probability
(True, True, True, True) 0.5686
(True, True, True, False) 0.0299
(True, True, False, True) 0.0135
(True, True, False, False) 0.0007
… …
Ex.
19.1 Review of Probability Theory (2/4) 19.1 Review of Probability Theory (2/4)
l
Marginal probability
l
Conditional probability
¨ Ex. The probability that the battery is charged given Ex.
l
Marginal probability
l
Conditional probability
¨ Ex. The probability that the battery is charged given
19.1 Review of Probability Theory (3/4) 19.1 Review of Probability Theory (3/4)
Figure 19.1 A Venn Diagram
19.1 Review of Probability Theory (4/4) 19.1 Review of Probability Theory (4/4)
l
Chain rule
l
Bayes’ rule
l
¨ Abbreviation for l
Chain rule
l
Bayes’ rule
l
¨ Abbreviation for
19.2 Probabilistic Inference 19.2 Probabilistic Inference
l
The probability some variable V
ihas value v
igiven the evidence e =e.
( ) ( )
( )
( )
( )
[ ]
( )
( )
( R) p( R)
p
R p
R Q P p R Q P p R
p
R Q R p
Q p
= Ø Ø
= +
Ø
Ø Ø
+
= Ø Ø
= Ø Ø
3 . 0 1
. 0 2 . 0
, , )
, ,
| ,
l
The probability some variable V
ihas value v
igiven the evidence e =e.
p(P,Q,R) 0.3
p(P,Q,¬R) 0.2
p(P, ¬Q,R) 0.2
p(P, ¬Q,¬R) 0.1
p(¬P,Q,R) 0.05
p(¬P, Q, ¬R) 0.1 p(¬P, ¬Q,R) 0.05 p(¬P, ¬Q,¬R) 0.0
( ) ( )
( )
( )
( )
[ ]
( )
( )
( R) p( R)
p
R p
R Q P p R Q P p R
p
R Q R p
Q p
= Ø Ø
= +
Ø
Ø Ø
+
= Ø Ø
= Ø Ø
3 . 0 1
. 0 2 . 0
, , )
, ,
| ,
( ) ( )
( )
( )
( )
[ ]
( )
( )
( R) p( R)
p
R p
R Q P p R Q P p R
p
R Q R p
Q p
= Ø Ø
= +
Ø
Ø Ø Ø + Ø
= Ø Ø
Ø
= Ø Ø Ø
1 . 0 0
. 0 1 . 0
, , )
, ,
| ,
( )
( | ) (| | ) 1
75 . 0
|
= Ø
= Ø
R Q p R Q P
R Q p Q
Statistical Independence Statistical Independence
l
Conditional independence
l
Mutually conditional independence
l
Unconditional independence
l
Conditional independence
l
Mutually conditional independence
l
Unconditional independence
19.3 Bayes Networks (1/2) 19.3 Bayes Networks (1/2)
l
Directed, acyclic graph (DAG) whose nodes are labeled by random variables.
l
Characteristics of Bayesian networks
¨ Node Vi is conditionally independent of any subset of nodes that are not descendents of Vi.
l
Prior probability
l
Conditional probability table (CPT)
l
Directed, acyclic graph (DAG) whose nodes are labeled by random variables.
l
Characteristics of Bayesian networks
¨ Node Vi is conditionally independent of any subset of nodes that are not descendents of Vi.
l
Prior probability
l
Conditional probability table (CPT)
( )
Õ
( )=
=
k
i
i i
k p V Pa V
V V
V p
1 2
1, ,..., | ( )
19.3 Bayes Networks (2/2)
19.3 Bayes Networks (2/2)
19.4 Patterns of Inference in Bayes 19.4 Patterns of Inference in Bayes Networks (1/3)
Networks (1/3)
l Causal or top-down inference
¨ Ex. The probability that the arm moves given that the block is liftable
( ) ( ) ( )
( ) ( ) ( ) ( )
(
M B L) ( )
p B p(
M B L) (
p B)
p
L B p L B M
p L
B p L B M p
L B M
p L
B M p L
M p
Ø Ø
+
=
Ø Ø
+
=
Ø +
=
,
| ,
|
| ,
|
| ,
|
| ,
| ,
|
l Causal or top-down inference
¨ Ex. The probability that the arm moves given that the block is liftable
( ) ( ) ( )
( ) ( ) ( ) ( )
(
M B L) ( )
p B p(
M B L) (
p B)
p
L B p L B M
p L
B p L B M p
L B M
p L
B M p L
M p
Ø Ø
+
=
Ø Ø
+
=
Ø +
=
,
| ,
|
| ,
|
| ,
|
| ,
| ,
|
19.4 Patterns of Inference in Bayes 19.4 Patterns of Inference in Bayes Networks (2/3)
Networks (2/3)
l Diagnostic or bottom-up inference
¨ Using an effect (or symptom) to infer a cause
¨ Ex. The probability that the block is not liftable given that the arm does not move.
(
ØM |ØL)
= 0.9525p (using a causal reasoning)
l Diagnostic or bottom-up inference
¨ Using an effect (or symptom) to infer a cause
¨ Ex. The probability that the block is not liftable given that the arm does not move.
(
ØM |ØL)
= 0.9525p (using a causal reasoning)
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( | ) 0.88632
03665 .
0 7 . 0 0595 . 0
| |
28575 .
0 3 . 0 9525 . 0
| |
= Ø Ø
= Ø Ø
= ´ Ø
= Ø Ø
= Ø Ø
= ´ Ø
Ø Ø
= Ø Ø
Ø
M L p
M p M
p M
p
L p L M M p
L p
M p M
p M
p
L p L M M p
L
p (Bayes’ rule)
19.4 Patterns of Inference in Bayes 19.4 Patterns of Inference in Bayes Networks (3/3)
Networks (3/3)
l Explaining away
¨ ¬B explains ¬M, making ¬L less certain
( ) ( ) ( )
( )
( ) ( ) ( )
( )
( ) ( ) ( )
( )
88632 .
0 030 . 0
, ,
|
,
| ,
|
,
| , ,
|
<<
=
Ø Ø
Ø Ø
Ø Ø
= Ø
Ø Ø
Ø Ø
Ø Ø
Ø
= Ø
Ø Ø
Ø Ø
Ø
= Ø Ø
Ø Ø
M B p
L p B p L B M p
M B p
L p L B p L B M p
M B p
L p L B M M p
B L
p (Bayes’ rule)
(def. of conditional prob.)
l Explaining away
¨ ¬B explains ¬M, making ¬L less certain
( ) ( ) ( )
( )
( ) ( ) ( )
( )
( ) ( ) ( )
( )
88632 .
0 030 . 0
, ,
|
,
| ,
|
,
| , ,
|
<<
=
Ø Ø
Ø Ø
Ø Ø
= Ø
Ø Ø
Ø Ø
Ø Ø
Ø
= Ø
Ø Ø
Ø Ø
Ø
= Ø Ø
Ø Ø
M B p
L p B p L B M p
M B p
L p L B p L B M p
M B p
L p L B M M p
B L p
(structure of the Bayes network)
19.5 Uncertain Evidence 19.5 Uncertain Evidence
l We must be certain about the truth or falsity of the propositions they represent.
¨ Each uncertain evidence node should have a child node, about which we can be certain.
¨ Ex. Suppose the robot is not certain that its arm did not move.
< Introducing M’ : “The arm sensor says that the arm moved”
– We can be certain that that proposition is either true or false.
< p(¬L| ¬B, ¬M’) instead of p(¬L| ¬B, ¬M)
¨ Ex. Suppose we are uncertain about whether or not the battery is charged.
< Introducing G : “Battery guage”
l We must be certain about the truth or falsity of the propositions they represent.
¨ Each uncertain evidence node should have a child node, about which we can be certain.
¨ Ex. Suppose the robot is not certain that its arm did not move.
< Introducing M’ : “The arm sensor says that the arm moved”
– We can be certain that that proposition is either true or false.
< p(¬L| ¬B, ¬M’) instead of p(¬L| ¬B, ¬M)
¨ Ex. Suppose we are uncertain about whether or not the battery is charged.
< Introducing G : “Battery guage”
19.6 D
19.6 D--Separation (1/3) Separation (1/3)
l e d-separates Vi and Vj if for every undirected path in the Bayes network between Vi and Vj, there is some node, Vb, on the path having one of the following three properties.
¨ Vb is in e, and both arcs on the path lead out of Vb
¨ Vb is in e, and one arc on the path leads in to Vb and one arc leads out.
¨ Neither Vb nor any descendant of Vb is in e, and both arcs on the path lead in to Vb.
l Vb blocks the path given e when any one of these conditions holds for a path.
l Two nodes Vi and Vj are conditionally independent given a set of node e
l e d-separates Vi and Vj if for every undirected path in the Bayes network between Vi and Vj, there is some node, Vb, on the path having one of the following three properties.
¨ Vb is in e, and both arcs on the path lead out of Vb
¨ Vb is in e, and one arc on the path leads in to Vb and one arc leads out.
¨ Neither Vb nor any descendant of Vb is in e, and both arcs on the path lead in to Vb.
l Vb blocks the path given e when any one of these conditions holds for a path.
l Two nodes Vi and Vj are conditionally independent given a set of node e
19.6 D
19.6 D--Separation (2/3) Separation (2/3)
19.6 D
19.6 D--Separation (3/3) Separation (3/3)
l
Ex.
¨ I(G, L|B) by rules 1 and 3
<By rule 1, B blocks the (only) path between G and L, given B.
<By rule 3, M also blocks this path given B.
¨ I(G, L)
<By rule 3, M blocks the path between G and L.
¨ I(B, L)
<By rule 3, M blocks the path between B and L.
l
Even using d-separation, probabilistic inference in Bayes networks is, in general, NP-hard.
l
Ex.
¨ I(G, L|B) by rules 1 and 3
<By rule 1, B blocks the (only) path between G and L, given B.
<By rule 3, M also blocks this path given B.
¨ I(G, L)
<By rule 3, M blocks the path between G and L.
¨ I(B, L)
<By rule 3, M blocks the path between B and L.
l
Even using d-separation, probabilistic inference in
Bayes networks is, in general, NP-hard.
19.7 Probabilistic Inference in 19.7 Probabilistic Inference in Polytrees (1/2)
Polytrees (1/2)
l
Polytree
¨ A DAG for which there is just one path, along arcs in either direction, between any two nodes in the DAG.
l
A node is above Q
¨ The node is connected to Q only through Q’s parents l
A node is below Q
¨ The node is connected to Q only through Q’s immediate successors.
l
Three types of evidence.
¨ All evidence nodes are above Q.
¨ All evidence nodes are below Q.
¨ There are evidence nodes both above and below Q.
19.7 Probabilistic Inference in 19.7 Probabilistic Inference in Polytrees (2/2)
Polytrees (2/2)
l
A node is above Q
¨ The node is connected to Q only through Q’s parents l
A node is below Q
¨ The node is connected to Q only through Q’s immediate successors.
l
Three types of evidence.
¨ All evidence nodes are above Q.
¨ All evidence nodes are below Q.
¨ There are evidence nodes both above and below Q.
Evidence Above (1/2) Evidence Above (1/2)
l
Bottom-up recursive algorithm
l
Ex. p(Q|P5, P4)
( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( )
å å å å å
=
=
=
=
=
7 , 6
7 , 6
7 , 6
7 , 6
7 , 6
4
| 7 5
| 6 7
, 6
|
4 , 5
| 7 4
, 5
| 6 7
, 6
|
4 , 5
| 7 , 6 7
, 6
|
4 , 5
| 7 , 6 4
, 5 , 7 , 6
|
4 , 5
| 7 , 6 , 4
, 5
|
P P
P P
P P
P P
P P
P P p P P p P P Q p
P P P p P P P p P P Q p
P P P P p P P Q p
P P P P p P P P P Q p
P P P P Q p P
P Q p
(Structure of
The Bayes network) l
Bottom-up recursive algorithm
l
Ex. p(Q|P5, P4)
( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( )
å å å å å
=
=
=
=
=
7 , 6
7 , 6
7 , 6
7 , 6
7 , 6
4
| 7 5
| 6 7
, 6
|
4 , 5
| 7 4
, 5
| 6 7
, 6
|
4 , 5
| 7 , 6 7
, 6
|
4 , 5
| 7 , 6 4
, 5 , 7 , 6
|
4 , 5
| 7 , 6 , 4
, 5
|
P P
P P
P P
P P
P P
P P p P P p P P Q p
P P P p P P P p P P Q p
P P P P p P P Q p
P P P P p P P P P Q p
P P P P Q p P
P Q p
(Structure of
The Bayes network) (d-separation)
(d-separation)
Evidence Above (2/2) Evidence Above (2/2)
l
Calculating p(P7|P4) and p(P6|P5)
l
Calculating p(P5|P1)
¨ Evidence is “below”
¨ Here, we use Bayes’ rule
( ) ( ) ( ) ( ) ( )
( )
å
( ) ( ) ( )å å
=
=
=
2 , 1
3 3
2 5
| 1 2
, 1
| 6 5
| 6
3 4
, 3
| 7 4
| 3 4
, 3
| 7 4
| 7
P P
P P
P p P P p P P P p P
P p
P p P P P p P
P p P P P p P
P p
l
Calculating p(P7|P4) and p(P6|P5)
l
Calculating p(P5|P1)
¨ Evidence is “below”
¨ Here, we use Bayes’ rule
( ) ( ) ( ) ( ) ( )
( )
å
( ) ( ) ( )å å
=
=
=
2 , 1
3 3
2 5
| 1 2
, 1
| 6 5
| 6
3 4
, 3
| 7 4
| 3 4
, 3
| 7 4
| 7
P P
P P
P p P P p P P P p P
P p
P p P P P p P
P p P P P p P
P p
( ) ( ) ( )
( )
5 1 1| 5 5
|
1 p P
P p P P P p
P
p =
Evidence Below (1/2) Evidence Below (1/2)
l
Top-down recursive algorithm
( ) ( ) ( )
( )
( ) ( )
(
P P Q) (
p P P Q) ( )
p Qkp
Q p Q P
P P
P kp
P P
P P
p
Q p Q P
P P
P P p
P P
P Q p
| 11 ,
14
| 13 ,
12
| 11 ,
14 ,
13 ,
12
11 ,
14 ,
13 ,
12
| 11 ,
14 ,
13 ,
1 12 ,
14 ,
13 ,
12
|
=
=
=
l
Top-down recursive algorithm
( ) ( ) ( )
( )
( ) ( )
(
P P Q) (
p P P Q) ( )
p Qkp
Q p Q P
P P
P kp
P P
P P
p
Q p Q P
P P
P P p
P P
P Q p
| 11 ,
14
| 13 ,
12
| 11 ,
14 ,
13 ,
12
11 ,
14 ,
13 ,
12
| 11 ,
14 ,
13 ,
1 12 ,
14 ,
13 ,
12
|
=
=
=
( ) ( ) ( )
( ) ( )
å å
=
=
9
| 9 9
| 13 ,
12
| 9 ,
9
| 13 ,
12
| 13 ,
12
P
Q P
p P P
P p
Q p
p Q P P
P p Q
P P
p
Evidence Below (2/2) Evidence Below (2/2)
( ) ( ) ( )
( ) ( ) ( )
å å
=
=
10 10
| 10 10
| 11 10
| 14
| 10 10
| 11 ,
14
| 11 ,
14
P P
Q P
p P
P p P
P p
Q P
p P
P P
p Q
P P
p
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( 15, 10| 11) ( 15| 10, 11) ( 10| 11) ( 15| 10, 11) ( 10)
11 11
| 10 , 10 15
, 15
11 11
| 10 , 10 15
, 15
| 11
11 11
, 10
| 15 10
| 15
10
| 15 10
, 15
| 11 10
| 11
1 11
15
P p P P
P p P
P p P P
P p P
P P
p
P p P P
P p P k
P p
P p P P
P P p
P P
p
P p P P
P p P
P p
P P
p P
P P
p P
P p
P P
=
=
- =
=
=
=
å
( )
å
( ) ( )( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( 15, 10| 11) ( 15| 10, 11) ( 10| 11) ( 15| 10, 11) ( 10)
11 11
| 10 , 10 15
, 15
11 11
| 10 , 10 15
, 15
| 11
11 11
, 10
| 15 10
| 15
10
| 15 10
, 15
| 11 10
| 11
1 11
15
P p P P
P p P
P p P P
P p P
P P
p
P p P P
P p P k
P p
P p P P
P P p
P P
p
P p P P
P p P
P p
P P
p P
P P
p P
P p
P P
=
=
- =
=
=
=
å
å
Evidence Above and Below Evidence Above and Below
( ) ( ) ( )
( )
( ) ( )
(
-) (
+)
+ +
-
+ -
+ +
- -
+
=
=
=
e e
e e
e
e e
e e
e e e
|
|
| ,
|
|
| ,
, |
|
2 2
Q p Q p
k
Q p Q
p k
p
Q p Q
Q p p
( Q | { P 5 , P 4 }, { P 12 , P 13 , P 14 , P 11 } )
p
e+ e-
( ) ( ) ( )
( )
( ) ( )
(
-) (
+)
+ +
-
+ -
+ +
- -
+
=
=
=
e e
e e
e
e e
e e
e e e
|
|
| ,
|
|
| ,
, |
|
2 2
Q p Q p
k
Q p Q
p k
p
Q p Q
Q p p
A Numerical Example (1/2) A Numerical Example (1/2)
(
Q U)
kp(
U Q) ( )
p Qp | = |
( ) ( ) ( )
( ) ( ) ( ) ( )
80 . 0 99 . 0 8 . 0 01 . 0 95 . 0
,
| ,
|
,
|
|
=
´ +
´
=
Ø Ø
+
=
=
å
R p Q R P
p R
p Q R P p
R p Q R P p Q
P p
R
( ) ( ) ( )
( ) ( ) ( ) ( )
80 . 0 99 . 0 8 . 0 01 . 0 95 . 0
,
| ,
|
,
|
|
=
´ +
´
=
Ø Ø
+
=
=
å
R p Q R P
p R
p Q R P p
R p Q R P p Q
P p
R
( ) ( ) ( )
( ) ( ) ( ) ( )
019 . 0 99 . 0 01 . 0 01 . 0 90 . 0
,
| ,
|
,
|
|
=
´ +
´
=
Ø Ø
Ø +
Ø
=
Ø
=
Ø
å
R p Q R P
p R p Q R P p
R p Q R P p Q
P p
R
A Numerical Example (2/2) A Numerical Example (2/2)
l Other techniques
( ) ( ) ( )
60 . 0 2 . 0 2 . 0 8 . 0 7 . 0
2 . 0
| 8
. 0
|
|
=
´ +
´
=
´ Ø
+
´
= p U P p U P
Q U p
( ) ( ) ( )
21 . 0 98 . 0 2 . 0 019 . 0 7 . 0
98 . 0
| 019
. 0
|
|
=
´ +
´
=
´ Ø
+
´
=
ØQ p U P p U P
U p
l Other techniques
( ) ( ) ( )
21 . 0 98 . 0 2 . 0 019 . 0 7 . 0
98 . 0
| 019
. 0
|
|
=
´ +
´
=
´ Ø
+
´
=
ØQ p U P p U P
U p
( )
( )
(
|)
4.35 0.03 0.13 ,35 . 4
20 . 0 95
. 0 21 . 0
|
03 . 0 05
. 0 6 . 0
|
=
´
=
=
\
´
=
´
´
= Ø
´
=
´
´
=
U Q p k
k k
U Q p
k k
U Q p
Additional Readings (1/5) Additional Readings (1/5)
l
[Feller 1968]
¨ Probability Theory
l
[Goldszmidt, Morris & Pearl 1990]
¨ Non-monotonic inference through probabilistic method l
[Pearl 1982a, Kim & Pearl 1983]
¨ Message-passing algorithm
l
[Russell & Norvig 1995, pp.447ff]
¨ Polytree methods l
[Feller 1968]
¨ Probability Theory
l
[Goldszmidt, Morris & Pearl 1990]
¨ Non-monotonic inference through probabilistic method l
[Pearl 1982a, Kim & Pearl 1983]
¨ Message-passing algorithm
l
[Russell & Norvig 1995, pp.447ff]
¨ Polytree methods
Additional Readings (2/5) Additional Readings (2/5)
l
[Shachter & Kenley 1989]
¨ Bayesian network for continuous random variables l
[Wellman 1990]
¨ Qualitative networks l
[Neapolitan 1990]
¨ Probabilistic methods in expert systems l
[Henrion 1990]
¨ Probability inference in Bayesian networks l
[Shachter & Kenley 1989]
¨ Bayesian network for continuous random variables l
[Wellman 1990]
¨ Qualitative networks l
[Neapolitan 1990]
¨ Probabilistic methods in expert systems l
[Henrion 1990]
¨ Probability inference in Bayesian networks
Additional Readings (3/5) Additional Readings (3/5)
l
[Jensen 1996]
¨ Bayesian networks: HUGIN system l
[Neal 1991]
¨ Relationships between Bayesian networks and neural networks
l
[Hecherman 1991, Heckerman & Nathwani 1992]
¨ PATHFINDER
l
[Pradhan, et al. 1994]
¨ CPCSBN l
[Jensen 1996]
¨ Bayesian networks: HUGIN system l
[Neal 1991]
¨ Relationships between Bayesian networks and neural networks
l
[Hecherman 1991, Heckerman & Nathwani 1992]
¨ PATHFINDER
l
[Pradhan, et al. 1994]
¨ CPCSBN
Additional Readings (4/5) Additional Readings (4/5)
l
[Shortliffe 1976, Buchanan & Shortliffe 1984]
¨ MYCIN: uses certainty factor l
[Duda, Hart & Nilsson 1987]
¨ PROSPECTOR: uses sufficiency index and necessity index l
[Zadeh 1975, Zadeh 1978, Elkan 1993]
¨ Fuzzy logic and possibility theory l
[Dempster 1968, Shafer 1979]
¨ Dempster-Shafer’s combination rules
l
[Shortliffe 1976, Buchanan & Shortliffe 1984]
¨ MYCIN: uses certainty factor l
[Duda, Hart & Nilsson 1987]
¨ PROSPECTOR: uses sufficiency index and necessity index l
[Zadeh 1975, Zadeh 1978, Elkan 1993]
¨ Fuzzy logic and possibility theory l
[Dempster 1968, Shafer 1979]
¨ Dempster-Shafer’s combination rules
Additional Readings (5/5) Additional Readings (5/5)
l
[Nilsson 1986]
¨ Probabilistic logic
l
[Tversky & Kahneman 1982]
¨ Human generally loses consistency facing uncertainty l
[Shafer & Pearl 1990]
¨ Papers for uncertain inference l
Proceedings & Journals
¨ Uncertainty in Artificial Intelligence (UAI)
¨ International Journal of Approximate Reasoning l
[Nilsson 1986]
¨ Probabilistic logic
l
[Tversky & Kahneman 1982]
¨ Human generally loses consistency facing uncertainty l
[Shafer & Pearl 1990]
¨ Papers for uncertain inference l
Proceedings & Journals
¨ Uncertainty in Artificial Intelligence (UAI)
¨ International Journal of Approximate Reasoning