• 검색 결과가 없습니다.

LEARNING AUTOMATA: An Introduction (2)

N/A
N/A
Protected

Academic year: 2022

Share "LEARNING AUTOMATA: An Introduction (2)"

Copied!
27
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

LEARNING AUTOMATA:

An Introduction (2)

By Kumpati S. Narendra

& Mandayam A. L. Thathachar CH 1.3 & CH 2

October 7th, 2016 Derek Hommel

Master’s Program in Linguistics Computational Linguistics Lab, SNU

http://knlp.snu.ac.kr

(2)

Outline

• what is an automaton?

• an analogy

• example: a basic LA

• LA and random environments

• LA and stochastic hill climbing

• LA and inductive inference

• LA and dual control

• other topics in brief

• TL;DR & Discussion Questions

(3)

What is a learning automaton?

• "The principal theme of the book is how a sequential decision maker

with a finite number of choices in the action set would choose an action at every instant, based on the response of a random environment.”

• a finite number of actions can be performed in a random environment.

• when an action is performed, the environment randomly responds either favorably or unfavorably.

• the choice of action should be guided by past actions and responses;

its performance should improve over time.

• decisions must be made with very little knowledge concerning the nature of the environment (deterministic, stochastic, adaptive).

(4)

What is a learning automaton?

• Quintuple {Φ, α, β, F(•,•), H(•,•)}

• states Φ = {φ1, φ3, φ3, …, φs}

• actions α = {α1, α2, α3, …, αr}

where α(n) is action at instant n

• responses β from environment:

• P: β = {a,b}

• Q: β = finite set > 2 elements

• S: β = continuous interval [0,1]

• Transition function F(•,•): Φ x β → Φ

• Output function H(•,•): Φ x β → α

• State-Output: G(•): Φ → α (e.g. ID)

From Unsal, Cem, Intelligent Navigation of Autonomous Vehicles in an Automated Highway System: Learning Methods and Interacting Vehicles Approach (1998) c.f. Learning Automata: An Introduction p.52

(5)

What is a learning automaton?

• Teacher metaphor (LA p.53):

Your professor believes in the reinforcement learning approach.

• He poses you a question for which you have a finite set of possible answers.

• However, when you give him an answer, he sometimes replies contrary to the actual

reply (‘no’ instead of ‘good’).

• How can you ascertain what the correct

answer is in this uncertain environment?

(6)

A clearer analogy

(7)

Overwatch

(8)

Overwatch bot

• Quintuple {Φ, α, β, F(•,•), H(•,•)}

• actions α = { aim, shoot, crouch, jump, run, ult, …}

states Φ = { aim, shoot, crouch, jump, run, ult, …}

• Input β = { bad outcome, good outcome }

• F(•,•) : Φ x β → Φ “given input β when in state i, go to state j”

• H(•,•) : Φ x β → α “given input β when in state i, do some action”

(9)

Overwatch bot

(10)

What is a learning automaton?

Terms:

Deterministic: the transition from any state to another is fixed, and the output given any state is fixed

Stochastic: the transition function and/or the output function has a

probabilistic element; e.g. if F is stochastic, the next state is random and F gives the probabilities of moving to each other state

Fixed-structure: transition and output are fixed; so in a stochastic fixed structure LA transitions are still random but transition and output

probabilities fij and gij are fixed

variable-structure: transition and output probabilities are able to be modified given input

(11)

A Basic Learning Automaton

• A very simple VSLA that ‘learns’ which item in a list is best <whatever>

example: finds which person is the current US President

P-model: environment input is binary {0, 1} corresponding to yes-no

state-output: output not determined by past states, only current one

stochastic: the next state is chosen randomly according to probs in F

variable-structure: this LA will update transition probabilities at each iteration, so that it reflects the current environment (at n)

(12)

A Basic Learning Automaton

In this case since this is a state-output machine using identity matrix for G (= just output state), then we only need to consider transition function

• Here, the ‘best’ action does not depend on the last β input, so:

• Also, our next state does not depend on the current one, so:

where p is a probability vector and pi is the probability in being in ith state

(13)

A Basic Learning Automaton

p = [1/r, 1/r, 1/r, …, 1/r]

While not done:

select i from prob p

do i and observe beta

Update p by learning scheme

choose between r items:

naïve assumption?

Continue until done:

Choose a ‘good’ action

Do action and see what happens

Given this new information, adjust my likely course of action

See: Masoumi and Meybodi, Learning automata based multi-agent system algorithms for finding optimal policies in Markov games (2012) and Unsal, Cem, Intelligent Navigation of Autonomous Vehicles in an Automated Highway System: Learning Methods and Interacting Vehicles Approach (1998)

(14)

A Basic Learning Automaton

General form of linear learning scheme:

If α(n) = ai :

• When β = 0: pj (n+1) = (1 – a) ∙ pj(n) for all j ≠ i

• pi (n+1) = pi (n) + a ∙ [1 - pi (n)]

• When β = 0: pj (n+1) = b/(1 – r) + (1 – b) ∙ pj(n) for all j ≠ I

pi (n+1) = (1 – b) ∙ pi(n)

• LR-P scheme: reward & penalty params equal: a = b

(15)

Changing Paradigms

(16)

Norms of Behavior

LA is expedient if limn→∞ 𝐸 𝑀 𝑛 < 𝑀0 where M(n) is avg penalty of α

That is, if the LA performs better than choosing at random (=M0) (‘pure-choice’)

LA is optimal if limn→∞ 𝐸 𝑀 𝑛 = 𝑐 where 𝑐 = min 𝑖{𝑐𝑖}

That is, the LA trends towards choosing the best option 100% of time

That’s difficult to achieve so ε-optimal if limn→∞ 𝐸 𝑀 𝑛 = 𝑐 + ε

That is, it converges to action close to to 𝑐. Good enough!

LA is absolutely expedient if 𝐸 𝑀 𝑛 + 1 |𝑝(𝑛) < 𝑀(𝑛) for all n, all

pi(n) ∈ (0, 1) and for all possible sets {c1, c2,…, cr} → 𝐸 𝑀 𝑛 + 1 < 𝐸[𝑀 𝑛 ]

That is, the expected average penalty gets better each iteration

(17)

Looking back to our issues…

• Random environments

• Stochastic hill climbing

• Inductive inference

• Dual control

• Bayesian learning

(18)

LA & Random Environments

Problem: potentially many possible actions to take

You could try every option x times, get average reward/penalty, take max

• But a lot of trials wasted on undesirable actions

• Learning scheme should ensure that probability weights become concentrated on fewer alternatives during learning (inverse-H)

• LA should be able to include new actions and eliminate actions (for example if their probability drops below a certain threshold)

(19)

LA & Stochastic Hill Climbing

Problem: is LA a type of hill-climbing (machine learning)?

Usually, hill-climbing (e.g. gradient descent) is done over the action space;

the algorithm is trying to reduce some cost function (e.g. mean-square error), essentially trying to ‘choose better action’ given last action

• In LA, no concept of neighborhood between actions ([?]because discrete)

But in a (variable-structure) LA where output probabilities are updated iteratively, this results in monotonically increasing performance and can be viewed as hill-climbing in probability space

(20)

LA & Inductive Inference

Issue: getting the expected answer only provides evidence for validity

• That is, we can’t be unequivocal about anything found experimentally

• Learning Automata use both inductive and deductive processes:

Given a set of prior probabilities, the LA deduces what action to take

• Then it observes the results and updates its model inductively

• [?] this iterative inductive-predictive process is similar to EM

(21)

LA & Dual Control

Problem: the surgeon’s dilemma between testing and operating

• limn→∞𝑓̂(𝑛) ≈ 𝑓(𝑛)

• We need good model but we can’t afford to wait around forever

• = our model needs to get incrementally better

• For Learning Automata, this depends on our learning scheme:

• too many actions to choose from or updates too gradually: too slow

• changes too greatly given one input: may converge to wrong answer

(22)

LA & Bayesian Learning

• The learning of learning automata is similar to Bayesian learning but differs in some regards:

• While the inductive part of the LA may roughly parallel Bayesian

learning, there is no close parallel to the deductive action selection.

• Various learning schemes exist; the learning scheme is a big factor in the efficacy of the learning automata

(23)

Other Topics

• LA & Psychology

Learning automata have been used to describe and model learning in organisms

• LA & Pattern Recognition

LA may be employed in pattern recognition (which has been called a type of learning), either singularly (action = categorization) or as a team of LA’s, each identifying various features of a pattern to aid classification.

• LA & Algorithms, Heuristics

Learning schemes (input >> probabilities) are algorithms

The choice of learning scheme is heuristic

(24)

Notes from CH9 about LA Application

• Best when many automata, each with small number of actions, operate in distributed complex system

• Systems that might benefit from LA approach have these qualities:

Sufficiently complex with large uncertainties that preclude mathematical modeling

Must be open to distributed control (finite actions at each stage)

Feedback must be provided by random performance criterion at each level

small performance improvements must lead to large economic gains (realistically)

• Domains using LA: routing traffic in communication networks, scheduling computer networks, decision-making in economic networks, image

processing and understanding

(25)

TL;DR

• Learning Automata model decision-making in a random environment

• Based on reinforcement learning

• Similar to previous (deterministic/stochastic) state-based models but incorporates ML and adaptive model concepts

• Parallels to the shift from Skinnerian behaviorist psychology to cognitive psychology (internal states, internal model of reality [p-vector])

(26)

Discussion Questions

• What might future applications of this model be?

• What are the potential weaknesses of this model?

• What of this model’s cognitive/psychological reality?

(27)

Citations

Masoumi, B. & Meybodi, M. R. Learning automata based multi-agent system algorithms for finding optimal policies in Markov Games. Asian Journal of Control 14 (1), pp.137 – 152. 2012.

Narendra, K. & Thathachar, M. Learning Automata: An Introduction.

Dover Publications, 2012.

Ünsal, Cem. Intelligent Navigation of Autonomous Vehicles in an

Automated Highway System: Learning Methods and Interacting Vehicles Approach. Carnegie Mellon University, 1998.

(Available online: https://theses.lib.vt.edu/theses/available/etd- 5414132139711101/)

참조

관련 문서

뇌우의 절반 아랫부분에 강한 상승기류가 있으면 저층 순환이 토네이도 안 으로 뻗칠 수 있다.. 앞에서 언급하였듯이 저층 반사도 경도에서 오복한 부 분의

두 효과가 서로 반대로 작용하는 경우를 조심하라(순 효과는 항상 뚜렷하 지 않다). 사실, 최대 온난 이류는 일반적으로 발달히는 저기압에서 양의

구름이 만들어지고 강수과정이 나타나는 지구 대류권에서 기화열은 융해열보다 중요하게 다루어진다... 지표부근 대기에서는 등온선과

불안정한 대기에서는 단열 냉각이 상승 운동에 미치는 효과 또는 단열 승 온이 하강 운동에 미치는 효과가 감소되기 때문에, 불안정은 오메가(상승 과 하강

따라서, 고도가 높은 단일층 구름에서 대류권 냉각이 가장 적게 일어나는 반면, 고도가 낮은 단일층 구름에서 대류권 냉각이 가장 크게 일어난다.... 나머지 4%는

여름철 집중호우를 야기하는 대표적인 형태는 중규모 대류계(Mesoscale Convective Systems, MCSs)이다... 하층제트는

빙정의 균질 핵생성은 빙정핵 또는 응결핵을 포함하지 않은 순수한 과냉각 수적의 동결이나 온도가 0°C 이하인 수증기 가 과포화 상태에서 빙정핵이 없이 수증기 분자들 간의

최근 국민 생활수준 향상으로 국민 삶의 질 향상을 위하여 다소 보수적인 예보보다는 나타날 가능성이 가장 높은 예보를 요구하고 있으며, 동네예보 가 이러한