Expectation of Reward

Next: The discounted value of Up: Notation Previous: Probabilities Transition Matrix

Expectation of Reward

The expected value function after t steps, starting from state s, using policy $\pi$ is:

$E_{s}^{\pi}[\vec{v}(X_{t})]=[P_{\pi}^{t-1}\cdot\vec{v}](s)$

Yishay Mansour
1999-11-24