# Reinforcement Learning

#### 28/10/99

Reinforcement Learning

Outline

Goal of Reinforcement Learning

Reinforcement Learning - origins

Typical Applications

Contrast with Supervised Learning

Mathematical Model - Motivation

Mathematical Model - MDP

MDP model - states and actions

MDP model - rewards

MDP model - trajectories

Simple example: N- armed bandit

MDP - Return function.

MDP model - return functions

MDP model - action selection

Contrast with Supervised Learning

MDP model - summary

Planning - Basic Problems.

Planning - Value Functions

Planning - Policy Evaluation

Algorithms - Policy Evaluation Example

Algorithms -Policy Evaluation Example

Algorithms - optimal control

Algorithms -Optimal control Example

Algorithms - optimal control

Algorithms -optimal control Example

Algorithms - optimal control

Algorithms - computing optimal policy

Algorithms - Linear Programming

Algorithms - Value Iterations

Algorithms - Policy Iterations

Algorithms - Open problems

Learning Algorithms

Learning - Model Based

Learning - Model Based

Learning - Model free Monte Carlo - Policy Evaluation

Monte Carlo - optimal control.

Learning - Model Free Temporal Differences -Policy evaluation

TD(0) - Optimal Control

Comparing TD and MC

Learning - Open Problems

Planning versus Learning

Example - Elevator Control

Current Research Efforts

Large Scale MDP

Large scale MDP

Large scale MDP - Restricted Value Function

TD-gammon

Function Approximation - basics

Function Approximation - Linear

Linear Function Approximation policy evaluation

Linear Function Approximation optimal control

Function Approximation - Conclusion

Large scale MDP - Restricted policies

Large scale MDP - Restricted models

Large Scale MDP - Generative Model.

Large Scale MDPs- research

Partially Observable MDP

POMDP - Belief State Algorithm

POMDP - Hard problems.

Summary

Resources

PPT Slide

Author: ???' ?"?

Email: mansour@cs.tau.ac.il