Homework number 2
1. Symmetric game
A symmetric two player game has u1(i,j)= u2(j,i). Show that every symmetric game has a symmetric mixed Nash equilibrium. (In a symmetric equilibrium both players have the same mixed action.)
2. External Regret: Show that a player playing a regret minimization algorithm guarantees to himself an average payoff of at least the minimax value minus ε where ε>0 can be arbitrarily small (as a function of T). Namely, the value it can guarantee in a zero-sum game against an adversary, with respect to its utility function. [HINT: show that an action sequence has an average payoff less than the minimax value, then it has an external regret.]
3. Regret: Show an example where the swap regret is unbounded with respect to the external regret. (There is an example that uses only 3 actions, has zero external regret and swap regret linear in T.)
Clarification: You need to choose only a loss sequence and actions (no adversary or algorithm). Given the loss sequence and the selected actions, the external regret is the difference between the loss of the action sequence and that of the loss of the best single action. The swap regret is the difference between the loss of the action sequence and the loss of the best function f of swapping actions.
4. Polynomial weights : Consider the following weight update: w(t+1,i)=w(t,i)(1-η loss(t,i)), where loss(t,i) is the loss of action i at time t, and initially w(1,i)=1.( Assume that the losses are in [0,1])
Show a regret bound of ηQ + ln(m)/η, where Q bound the sum of square of losses of any action, i.e.,
Q > maxi ∑t loss2(t,i), and η<½.
Here are the proposed steps. Let Wt be the sum of the weights at time t. Fix an action i, and let L(T,i) be the loss of action i until time T.Show:
· ln(WT+1 /W1) ≥ - ηL(T,i) – ln(m)- η2Q
o use the inequality ln(1-z) ≥ -z-z2 for z in [0,½].
· ln(WT+1/W1)≤ -ηLON, where LON is the online loss
o use the inequality ln(1+z) ≤ z
The homework is due May 17
Hand in your proposal for a project by May 24