• 검색 결과가 없습니다.

Adversarial Search

N/A
N/A
Protected

Academic year: 2021

Share "Adversarial Search"

Copied!
26
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

Adversarial Search

George Konidaris [email protected]

Spring 2017

(2)

Games

“Chess is the Drosophila of Artificial Intelligence”

Kronrod, c. 1966

TuroChamp, 1948

(3)

Why Study Games?

Of interest:

Many human activities (especially intellectual ones) can be modeled as games.

Prestige.

Convenient:

Perfect information.

Concise, precise rules.

Well defined “score”.

(4)

“Solved” Games

A game is solved if an optimal strategy is known.

Strong solved: all positions.

Weakly solved: some (start) positions.

(5)

Typical Game Setting

Games are usually:

2 player

Alternating

Zero-sum

Gain for one loss for another.

Perfect information Very much like search:

Start state

Successor function

Terminal states (many)

Objective function but alternating control.

(6)

Game Trees

o o

… o

o

x

o

x

… o

x

player 1 moves

player 1 moves player 2 moves

o

x

o

x

o

x

o o

o

(7)

Key Differences vs. Search

p1

p2 p2 p2

p1 p1 p1

only get score here

they select to min score

you select to

max score

(8)

Minimax Algorithm

Max player: select action to maximize return.

Min player: select action to minimize return.

This is optimal for both players (if zero sum).

Assumes perfect play, worst case.

Can run as depth first:

Time O(bd)

Space O(bd)

Require the agent to evaluate the whole tree.

(9)

Minimax

p1

p2 p2 p2

p1 p1 p1 p1 p1 p1

10 5 -3 20 -5 2

max

min

5 -3 -5

5

(10)

In Practice

Depth is too deep.

10s to 100s of moves.

Breadth is too broad.

Chess: 35, Go: 361.

Full search never terminates for non-trivial games.

(11)

What Is To Be Done?

Must terminate the search early. p1 p2

p1 p1 p1

p2 p2

(12)

In Practice

Solution: substitute evaluation function.

Like a heuristic - estimate value.

In this case, probability of win.

Common strategy:

Run to fixed depth then estimate.

Careful lookahead to depth d, then guess.

p1 p2 p1

!

(13)

Search Control

Horizon Effects

What if something interesting at horizon + 1?

How do you know?

More sophisticated strategies:

When to generate more nodes?

How to selectively expand the frontier?

How to allocate fixed move time?

(14)

Alpha-Beta

p1

p2 p2 p2

p1 p1 p1 p1 p1 p1

10 5

max

min 5

-3 -5

(15)

Alpha Beta Pruning

Single most useful search control method:

Throw away whole branches.

Use the min-max behavior.

Cutoff search at min nodes where max can force a better outcome.

Cutoff search at max nodes when min can force a worse outcome.

Resulting algorithm: alpha-beta pruning.

(16)

Alpha-Beta

Empirically: square roots branching factor.

Effectively doubles the search horizon.

Alpha-beta makes the difference between novice and expert computer game players. Most successful players use alpha-beta.

(17)

Deep Blue (1997)

480 Special Purpose Chips 200 million positions/sec

Search depth 6-8 moves (up to 20)

(18)

Games of Chance

What if there is a chance element?

(19)

Stochasticity

An outcome is called stochastic when it is determined at random.

3 2 1

6 5 4

p=1/6 p=1/6 p=1/6 p=1/6 p=1/6 p=1/6

sums to 1

(20)

Stochasticity

How to factor in stochasticity?

Agent does not get to choose.

Selecting the max outcome is optimistic.

Selecting the min outcome is pessimistic.

Must be probability-aware.

(21)

Expectation

What is the average die value?

This factors in both probabilities and the value of event.

In general, given random event x and function f(x):

(1 + 2 + 3 + 4 + 5 + 6)

6 = 3.5

E[f (x)] = X

x

P (x)f (x)

Insert expectation layer to accommodate stochastic events.

(22)

ExpectiMax

dice

they select to min score

stochastic (expectation)

p1 p1 p1

you select (max)

dice dice dice

p2

stochastic (expectation)

p2 p2

(23)

Games Today

World champion level:

Backgammon

Chess

Checkers (solved)

Othello

Some poker types:

“Heads-up Limit Hold’em Poker is Solved”, Bowling et al., Science, January 2015.

Perform well:

Bridge

Other poker types Far off: Go

(24)
(25)

Go

(26)

Very Recently

Lee Sedol AlphaGo

(Google Deepmind)

1 - 4

참조

관련 문서

Through empirical analysis using data of job sat- isfaction of workplace reserve force commanders, we show that the grid search method helps us generate an optimal decision tree