Existence of a unique solution

Next: Example: Up: Calculating the Return Value Previous: Calculating the Return Value

Existence of a unique solution

We define a linear transformation L_d: $L_d\vec{v}=\vec{r}_d+\lambda P_d\vec{v}$ .
Since $\vec{v}_{\lambda}^{\pi}=L_d\vec{v}_\lambda^{\pi}$ , $\vec{v}_\lambda^\pi$ is a fixed point of L_d.

Theorem 5.2 For $0 \leq\lambda<1$ and $\pi$ a Markovian Stationary policy,
$\vec{v}_\lambda^\pi$ is the unique solution for the equation set

$\begin{eqnarray*}\vec{v}=\vec{r}_d+\lambda p_d\vec{v} \end{eqnarray*}$

and is equal to

$\begin{eqnarray*}\vec{v}_{\lambda}^{\pi}=(I-\lambda P_d)^{-1}\vec{r}_d \end{eqnarray*}$

Proof:We can write the equation set as

$\begin{eqnarray*}\vec{v}(I-\lambda P_d) = \vec{r}_d \end{eqnarray*}$

Since P_d is a probability matrix, $\Vert P_d\Vert=1$ , and as $\lambda < 1$ , $\Vert\lambda P_d\Vert < 1$ .

According to Theorem

, $(I-\lambda P_d)^{-1}$ exists. Thus, a solution $\vec{v}=(I-\lambda P_d)^{-1}\vec{r}_d$ exists.

By the same theorem,

$\begin{eqnarray*}\vec{v}=(I-\lambda P_d)^{-1}\vec{r}_d=\sum_{i=0}^{\infty}(\lamb... ...infty}\lambda^{t-1}P_d^{t-1}\vec{r}_d=\vec{v}_\lambda^{\pi}\ . \end{eqnarray*}$

We have shown that the solution is the discounted return value of policy $\pi$ $\Box$

**Figure:** Example Diagram

Yishay Mansour
1999-11-24