1. Game Structure
Rock-Paper-Scissors is a simultaneous zero-sum game with payoff matrix:
Where R = Rock, P = Paper, S = Scissors.
Action Selection
The AI selects actions probabilistically based on normalized weights:
2. Payoff Structure
Standard Rock-Paper-Scissors payoffs:
- Win: +1 (Rock beats Scissors, Paper beats Rock, Scissors beats Paper)
- Tie: 0 (same action chosen)
- Loss: -1 (opponent's action beats yours)
3. Frequency-Based Learning
Alternative approach: track your historical frequencies and predict your next move.
If you've played Rock n_R times, Paper n_P times, and Scissors n_S times:
The AI then plays the action that beats your most likely next move.
4. Multiplicative Weights Update (MWU)
The canonical Hedge algorithm updates weights for all actions based on their performance:
Where \ell_t(a) is the loss for action a at time t, bounded in [0,1]:
- \ell_t(a) = 0 if action would win vs human's move
- \ell_t(a) = 0.5 if action would tie vs human's move
- \ell_t(a) = 1 if action would lose vs human's move
Probability Calculation
After updating weights, probabilities are calculated with exploration \gamma:
5. Regret Minimization
The Hedge algorithm minimizes regret, defined as:
This represents the difference between the best fixed strategy in hindsight and the algorithm's performance.
Regret Bound
For appropriate choice of \eta, the average regret is bounded:
Setting \eta = \sqrt{\frac{\ln(3)}{T}} gives regret O(\sqrt{\ln(3)/T}).
5. Nash Equilibrium
In the symmetric Rock-Paper-Scissors game, the unique Nash equilibrium is:
Any deviation from uniform random play can be exploited by an adaptive opponent.
6. Exploitability
If you play a predictable strategy \sigma = (p_R, p_P, p_S), the AI's best response yields expected payoff:
The more unbalanced your strategy, the higher the AI's advantage.