next up previous
Next: Calculating the Return Value Up: Introduction: Discounted Infinite Horizon Previous: The discounted value of

   
Assumptions

In this section we make the following simplifying assumptions.
1.
The immediate reward and the transition probability are stationary. Hence the functions r(s,a) and p(j|s,a) are identical for any time stop. One benefit is that the algorithm can have a finite input.
2.
The immediate reward is bounded: |r(s,a)|<M.
3.
The discounted parameter is $0 \leq\lambda<1$
4.
The number of states and actions is finite.


Yishay Mansour
1999-11-24