next up previous
Next: Example: Up: Calculating the Return Value Previous: Calculating the Return Value

   
Existence of a unique solution

We define a linear transformation Ld: $L_d\vec{v}=\vec{r}_d+\lambda
P_d\vec{v}$.
Since $\vec{v}_{\lambda}^{\pi}=L_d\vec{v}_\lambda^{\pi}$, $\vec{v}_\lambda^\pi$ is a fixed point of Ld.

Theorem 5.2   For $0 \leq\lambda<1$ and $\pi$ a Markovian Stationary policy,
$\vec{v}_\lambda^\pi$ is the unique solution for the equation set

\begin{eqnarray*}\vec{v}=\vec{r}_d+\lambda p_d\vec{v}
\end{eqnarray*}


and is equal to

\begin{eqnarray*}\vec{v}_{\lambda}^{\pi}=(I-\lambda P_d)^{-1}\vec{r}_d
\end{eqnarray*}


Proof:We can write the equation set as

\begin{eqnarray*}\vec{v}(I-\lambda P_d) = \vec{r}_d
\end{eqnarray*}


Since Pd is a probability matrix, $\Vert P_d\Vert=1$, and as $\lambda
< 1$, $\Vert\lambda P_d\Vert < 1$.

According to Theorem [*], $(I-\lambda P_d)^{-1}$ exists. Thus, a solution $\vec{v}=(I-\lambda P_d)^{-1}\vec{r}_d$ exists.

By the same theorem,

\begin{eqnarray*}\vec{v}=(I-\lambda P_d)^{-1}\vec{r}_d=\sum_{i=0}^{\infty}(\lamb...
...infty}\lambda^{t-1}P_d^{t-1}\vec{r}_d=\vec{v}_\lambda^{\pi}\
.
\end{eqnarray*}


We have shown that the solution is the discounted return value of policy $\pi$ $\Box$


  
Figure: Example Diagram



Yishay Mansour
1999-11-24