Loading news...

Rock-Paper-Scissors vs Learning AI

Play against an AI that learns your patterns using multiplicative weights algorithm. The AI adapts to exploit your tendencies over 20 rounds.

Game Settings

Rate0.1
Rate0.1
Algorithm: Multiplicative Weights
Rounds: 20 total

Metrics

Your Wins
0.0%
AI Wins
0.0%
Ties
0.0%
Rounds
0/20

Make Your Move

Start a new game to play against the learning AI!

Move History

Round 0 of 20
No moves yet

AI Strategy Evolution

How It Works

  • Multiplicative Weights: The AI updates all action weights based on how well they would have performed against your move. Lower losses lead to higher weights.
  • Exploration: Adds randomness to prevent the AI from becoming too predictable. Lower values mean more exploitation of learned patterns.
  • Strategy Evolution: Watch how the AI's move probabilities change over time as it learns your patterns.

Math & Strategy

1. Game Structure

Rock-Paper-Scissors is a simultaneous zero-sum game with payoff matrix:

\begin{array}{c|ccc} & R & P & S \\ \hline R & 0 & -1 & 1 \\ P & 1 & 0 & -1 \\ S & -1 & 1 & 0 \end{array}

Where R = Rock, P = Paper, S = Scissors.

Action Selection

The AI selects actions probabilistically based on normalized weights:

p_t(a) = \frac{w_t(a)}{\sum_{a' \in \{R,P,S\}} w_t(a')}

2. Payoff Structure

Standard Rock-Paper-Scissors payoffs:

  • Win: +1 (Rock beats Scissors, Paper beats Rock, Scissors beats Paper)
  • Tie: 0 (same action chosen)
  • Loss: -1 (opponent's action beats yours)

3. Frequency-Based Learning

Alternative approach: track your historical frequencies and predict your next move.

If you've played Rock n_R times, Paper n_P times, and Scissors n_S times:

\hat{p}(R) = \frac{n_R}{n_R + n_P + n_S}

The AI then plays the action that beats your most likely next move.

4. Multiplicative Weights Update (MWU)

The canonical Hedge algorithm updates weights for all actions based on their performance:

w_{t+1}(a) = w_t(a) \cdot \exp(-\eta \cdot \ell_t(a))

Where \ell_t(a) is the loss for action a at time t, bounded in [0,1]:

  • \ell_t(a) = 0 if action would win vs human's move
  • \ell_t(a) = 0.5 if action would tie vs human's move
  • \ell_t(a) = 1 if action would lose vs human's move
Probability Calculation

After updating weights, probabilities are calculated with exploration \gamma:

p(a) = (1-\gamma)\frac{w(a)}{\sum_b w(b)} + \frac{\gamma}{3}

5. Regret Minimization

The Hedge algorithm minimizes regret, defined as:

R_T = \max_{a^*} \sum_{t=1}^T u_t(a^*) - \sum_{t=1}^T \sum_a p_t(a) \cdot u_t(a)

This represents the difference between the best fixed strategy in hindsight and the algorithm's performance.

Regret Bound

For appropriate choice of \eta, the average regret is bounded:

\frac{R_T}{T} \leq \frac{\ln(3)}{\eta T} + \eta

Setting \eta = \sqrt{\frac{\ln(3)}{T}} gives regret O(\sqrt{\ln(3)/T}).

5. Nash Equilibrium

In the symmetric Rock-Paper-Scissors game, the unique Nash equilibrium is:

p^*(R) = p^*(P) = p^*(S) = \frac{1}{3}

Any deviation from uniform random play can be exploited by an adaptive opponent.

6. Exploitability

If you play a predictable strategy \sigma = (p_R, p_P, p_S), the AI's best response yields expected payoff:

\max\{p_S - p_P, p_R - p_S, p_P - p_R\}

The more unbalanced your strategy, the higher the AI's advantage.