next up previous
Next: Summary Up: Finite Horizon Previous: Finite Horizon

   
Markovian Policy

Theorem 4.1   Let Ut* be a solution to the optimality equations then
1.
For any t, $1 \leq t \leq N,$ Ut*(ht) depends on ht only through st
2.
There exist a Markovian deterministic optimal policy

Proof:We will use a reversed induction to prove (1).
Basis: UN*(hN)=rN(sN), therefore UN*(hN)=UN*(sN)
Induction Step: We assume the validity of the induction hypothesis for any n, $n \geq t+1$ and will prove the validity for t=n.

\begin{eqnarray*}U_t^* & = & \max_{a \in A} \{r_t(s_t,a) + \sum_{j\in S} P_t(j\v...
...{r_t(s_t,a) + \sum_{j\in S} P_t(j\vert s_t,a)U_{t+1}^*(j)\} }\\
\end{eqnarray*}


Note that the marked term depends merely on s and a. The entire term, therefore depends solely on s.
Thus,
Ut*(ht)=Ut*(st)
To prove (2), let $\pi$ be a Markovian deterministic policy that sutisfies:

\begin{displaymath}{\pi}_t(s_t) = argmax_{a \in A} \{r_t(s_t,a) + \sum_{j\in S} P_t(j\vert s_t,a)U_{t+1}^*(j)\}\end{displaymath}

Since the policy's definition depends solely on st, namely the current state, ${\pi}_t$ is a Markovian policy.$\Box$



 

Yishay Mansour
1999-11-18