Reinforcement Learning(RL)
김형욱
IVIS Lab, Changwon National University
Reinforcement Learning
Atari Breakout Game(2013, 2015)
IVIS Lab, Changwon National University
Reinforcement Learning
Deep reinforcement learning
IVIS Lab, Changwon National University
Games with RL
AlphaGo with RL
IVIS Lab, Changwon National University
Google Data Center
Reinforcement Learning Applications
• Robotics : torque or joints
• Business operations
– Inventory management : how much to purchase of inventory, spare parts
– Resource allocation : e.g. in call center, who to service first
• Finance : Investment decisions, portfolio design
• E-commerce/media
– What content to present to users (using click-through / visit time as reward)
– What ads to present to users (avoiding ad fatigue)
Example - OpenAI GYM Game
Frozen Lake World
IVIS Lab, Changwon National University
Frozen Lake World (OpenAI Gym)
Frozen Lake World (OpenAI Gym)
IVIS Lab, Changwon National University
Frozen Lake World (OpenAI Gym)
Frozen Lake World (OpenAI Gym)
IVIS Lab, Changwon National University
Frozen Lake World (OpenAI Gym)
Basic installation steps
• OpenAI Gym
– sudo apt install cmake – apt-get install zlib1g-dev – sudo -H pip install gym
– sudo -H pip install gym[atari]
IVIS Lab, Changwon National University
Frozen Lake:Random?
Q-function(state-action value function)
IVIS Lab, Changwon National University
Q-function(state-action value function)
Policy using Q-function
IVIS Lab, Changwon National University
Optimal Policy, 𝝿 and Max Q
Finding, Learning Q
• Assume (believe) Q in s` exists!
• My condition – I am in s
– When I do action a, I’ll go to s`
– When I do action a, I’ll get reward r – Q in s`, Q(s`, a`) exist
• How can we express Q(s, a) using Q(s`, a`)?
IVIS Lab, Changwon National University
Learning Q(s, a)
State, action, reward
IVIS Lab, Changwon National University
Future reward
Learning Q(s, a)
IVIS Lab, Changwon National University
Learning Q(s, a)
Learning Q(s, a) - initial Q values are 0
IVIS Lab, Changwon National University
Learning Q(s, a)
Learning Q(s, a)
IVIS Lab, Changwon National University