• 검색 결과가 없습니다.

Reinforcement Learning(RL)

N/A
N/A
Protected

Academic year: 2022

Share "Reinforcement Learning(RL)"

Copied!
32
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

Reinforcement Learning(RL)

김형욱

(2)

IVIS Lab, Changwon National University

Reinforcement Learning

(3)

Atari Breakout Game(2013, 2015)

(4)

IVIS Lab, Changwon National University

Reinforcement Learning

(5)

Deep reinforcement learning

(6)

IVIS Lab, Changwon National University

Games with RL

(7)

AlphaGo with RL

(8)

IVIS Lab, Changwon National University

Google Data Center

(9)

Reinforcement Learning Applications

• Robotics : torque or joints

• Business operations

– Inventory management : how much to purchase of inventory, spare parts

– Resource allocation : e.g. in call center, who to service first

• Finance : Investment decisions, portfolio design

• E-commerce/media

– What content to present to users (using click-through / visit time as reward)

– What ads to present to users (avoiding ad fatigue)

(10)

Example - OpenAI GYM Game

(11)

Frozen Lake World

(12)

IVIS Lab, Changwon National University

Frozen Lake World (OpenAI Gym)

(13)

Frozen Lake World (OpenAI Gym)

(14)

IVIS Lab, Changwon National University

Frozen Lake World (OpenAI Gym)

(15)

Frozen Lake World (OpenAI Gym)

(16)

IVIS Lab, Changwon National University

Frozen Lake World (OpenAI Gym)

(17)

Basic installation steps

• OpenAI Gym

– sudo apt install cmake – apt-get install zlib1g-dev – sudo -H pip install gym

– sudo -H pip install gym[atari]

(18)

IVIS Lab, Changwon National University

Frozen Lake:Random?

(19)

Q-function(state-action value function)

(20)

IVIS Lab, Changwon National University

Q-function(state-action value function)

(21)

Policy using Q-function

(22)

IVIS Lab, Changwon National University

Optimal Policy, 𝝿 and Max Q

(23)

Finding, Learning Q

• Assume (believe) Q in s` exists!

• My condition – I am in s

– When I do action a, I’ll go to s`

– When I do action a, I’ll get reward r – Q in s`, Q(s`, a`) exist

• How can we express Q(s, a) using Q(s`, a`)?

(24)

IVIS Lab, Changwon National University

Learning Q(s, a)

(25)

State, action, reward

(26)

IVIS Lab, Changwon National University

Future reward

(27)

Learning Q(s, a)

(28)

IVIS Lab, Changwon National University

Learning Q(s, a)

(29)

Learning Q(s, a) - initial Q values are 0

(30)

IVIS Lab, Changwon National University

Learning Q(s, a)

(31)

Learning Q(s, a)

(32)

IVIS Lab, Changwon National University

Learning Q(s, a)

참조

관련 문서

- 웹사이트 소유자를 위한 서비스: 개인 홈페이지나 블로그에 관련성이 높은 배너 광고를

The “Asset Allocation” portfolio assumes the following weights: 25% in the S&P 500, 10% in the Russell 2000, 15% in the MSCI EAFE, 5% in the MSCI EME, 25% in the

Novel electronic, optical, magnetic, chemical, catalytic, and mechanical properties: different from bulk materials - High surface to volume ratio: high defect

 Early advancements using chemical principles (Mass Transfer & Polymer Materials)..  Nasal sprays that deliver finely atomized amounts of a

– The frequency offset between incoming data and local clock is compensated by controling the number of data bits recovered in a single cycle.

Environmental engineering is the solution of problems of environmental sanitation, notably in the provision of:.. • Safe, palatable, and ample

□ The least-upper-bound property (sometimes called completeness or supremum property) is a fundamental property of the real number system. The least-upper-bound

□ The least-upper-bound property (sometimes called completeness or supremum property) is a fundamental property of the real number system. The least-upper-bound