|
This
is a Win32 application, which enables the user to see the results of
Reinforcement Learning on the problem of agent-navigation.
In
order to understand the changes in learning derived from the learning
context (the world), various worlds can be selected, and the
Reinforcement Learning algorithms can be performed on each of them.
Agent which had been trained to learn a specific policy can be saved to
disk and then placed in other worlds.
User-Interface
The Windows
application enables the user the following things:
·
Choose an agent
·
Choose a world
·
Find the best policy of
the chosen agent in the chosen world by using the Sarsa Algorithm
·
Create a random policy for
this agent
·
Evaluate a policy found
for the agent (or the random policy) by using the TD0 algorithm
·
Show the chosen agent
navigate in the chosen world using its policy
·
Save Agent’s policy to
the disk
The program
provides a graphical user interface to allow the user full control over
the various parameters of the agent (number of eyes, view range, life
span ext), the world (can be chosen from a list of worlds), the
algorithm (number of iterations) and the navigation shown.
Usage
Following is a
diagram describing the application interface:
The application UI
includes the following controls:
Agent’s
Parameters:
·
Number of Eyes –
Determines the number of agent’s eyes.
·
View Range –
Determines the vision field of the agent
·
View Distance –
Determines how “far” the agent sees. Distance in cells.
·
Agent’s life
span – Determines
the life span of the agent in the world (number of steps he
performs in the world before it dies).
·
Number of Eye
sections – number
of vision sections of the agent, meaning number of discrete
areas in which agent can see something, in each eye.
·
Number of MDP
states – is derived
from the chosen parameters. The number of MDP states depends on
number of agent’s eyes and number of eyes sections.
|
World:
·
Choose a world
from the given list. This will be the world in which the agent
can find its best policy, calculate it or navigate in it. The
agent can be trained in one world and evaluated or viewed in
another. The chosen world is the one in which the user action
will be performed.
|
Algorithm
Parameters:
·
Number of
Algorithm Steps –
Determines the number of iterations the requested algorithms
performs before returning the calculated results. Note: The
number of steps defines the number of updates which will be
performed. Since the application always completes an agent’s
life span, it may perform a couple of extra steps.
|
Actions:
Each of the following actions creates a new policy, for the chosen agent.
The created policy will be added to the List of Policies with
all its parameters.
·
Find Best
Policy – Pressing
this button initiates the Sarsa algorithm with the chosen agent
in the chosen world, for the number of steps specified, in order
to find the best agent’s policy. Note that a best policy in
one world is not necessarily the best policy in a different
world.
·
Create Random
Policy – creates a
random policy for the specific agent chosen.
|
Policies:
This
tab includes all the operation which can be done on a given
policy, whether it was found by the Sarsa algorithm or is was
created randomly.
An action can be initiated (using the 4 buttons) only when a specific
policy is selected from list of policies.
Pressing one of the buttons initiates the chosen operation on
the policy which is currently selected in the list.
The following actions can be performed on a policy:
·
Evaluate Policy
(TD0) – Pressing
this button initiates the TD0 algorithm that evaluates a given
policy. The policy is evaluated on the world which is currently
chosen, for an agent with parameters specified in the policy’s
line in the list.
·
Save Policy - Save
policy to disk. A dialog is opened and lets the user choosing
the policy’s file name. All policies are saved under the
application’s directly with the extension “.plc”. Each
saved policy will be displayed in the list of policies next time
application is launched.
·
Delete Policy
– Deletes chosen
policy from disk and from policies list.
·
Show Policy /
Hide Policy – Let
the agent (whose parameters are specified in the policy list)
navigate in the chosen world according to the chosen policy. A
view of the world will be displayed. the agent is situated in a
random position inside the world, and navigates according to the
policy.
After pressing “Show Policy” the button’s text changes
to “Hide Policy” and pressing it will hide the world’s
view. In order to show another policy, press “Hide Policy”
and then “Show Policy”.
·
Number of steps
-
number of navigation steps of the agent in the world used when
choosing the “Show Policy” option.
|
List Of Policies:
A policy is added to the list each time “Find Best Policy” or
“Create Random Policy” buttons are pressed.
The following parameters are presented (and saved if requested) for each
policy:
·
Policy Name –
when the policy is created this will be an informative name:
Policy#1_#2.plc” when #1 stands for number of agent’s states
and #2 stands for policy score (-1 of not calculated yet). When
the policy is saved to disk, the file name which is chosen by
the user, becomes the name of the policy as well.
·
Number of states
– is determined by the next 2 parameters
·
Number of eyes
·
Number of sections
(for each eye)
·
Learned On –
specifies the world in which the policy was found. Will be
"None" for a randomly created policy.
·
Score –
policy’s score. This is an average score over all states.
·
Range – view
range of agent (0-180)
·
Dist – view
distance of agent
|
|
|