Luis Schmitz

Metrics

Your Wins

0.0%

AI Wins

0.0%

Ties

0.0%

Rounds

0/20

Make Your Move

Start a new game to play against the learning AI!

Move History

Round 0 of 20

No moves yet

AI Strategy Evolution

How It Works

Multiplicative Weights: The AI updates all action weights based on how well they would have performed against your move. Lower losses lead to higher weights.
Exploration: Adds randomness to prevent the AI from becoming too predictable. Lower values mean more exploitation of learned patterns.
Strategy Evolution: Watch how the AI's move probabilities change over time as it learns your patterns.

Math & Strategy

1. Game Structure

Rock-Paper-Scissors is a simultaneous zero-sum game with payoff matrix:

\begin{array}{c|ccc} & R & P & S \\ \hline R & 0 & -1 & 1 \\ P & 1 & 0 & -1 \\ S & -1 & 1 & 0 \end{array}

Where $R$ = Rock, $P$ = Paper, $S$ = Scissors.

Action Selection

The AI selects actions probabilistically based on normalized weights:

p_t(a) = \frac{w_t(a)}{\sum_{a' \in \{R,P,S\}} w_t(a')}

2. Payoff Structure

Standard Rock-Paper-Scissors payoffs:

Win: $+1$ (Rock beats Scissors, Paper beats Rock, Scissors beats Paper)
Tie: $0$ (same action chosen)
Loss: $-1$ (opponent's action beats yours)

3. Frequency-Based Learning

Alternative approach: track your historical frequencies and predict your next move.

If you've played Rock $n_R$ times, Paper $n_P$ times, and Scissors $n_S$ times:

\hat{p}(R) = \frac{n_R}{n_R + n_P + n_S}

The AI then plays the action that beats your most likely next move.

4. Multiplicative Weights Update (MWU)

The canonical Hedge algorithm updates weights for all actions based on their performance:

w_{t+1}(a) = w_t(a) \cdot \exp(-\eta \cdot \ell_t(a))

Where $\ell_t(a)$ is the loss for action $a$ at time $t$ , bounded in $[0,1]$ :

$\ell_t(a) = 0$ if action would win vs human's move
$\ell_t(a) = 0.5$ if action would tie vs human's move
$\ell_t(a) = 1$ if action would lose vs human's move

Probability Calculation

After updating weights, probabilities are calculated with exploration $\gamma$ :

p(a) = (1-\gamma)\frac{w(a)}{\sum_b w(b)} + \frac{\gamma}{3}

5. Regret Minimization

The Hedge algorithm minimizes regret, defined as:

R_T = \max_{a^*} \sum_{t=1}^T u_t(a^*) - \sum_{t=1}^T \sum_a p_t(a) \cdot u_t(a)

This represents the difference between the best fixed strategy in hindsight and the algorithm's performance.

Regret Bound

For appropriate choice of $\eta$ , the average regret is bounded:

\frac{R_T}{T} \leq \frac{\ln(3)}{\eta T} + \eta

Setting $\eta = \sqrt{\frac{\ln(3)}{T}}$ gives regret $O(\sqrt{\ln(3)/T})$ .

5. Nash Equilibrium

In the symmetric Rock-Paper-Scissors game, the unique Nash equilibrium is:

p^*(R) = p^*(P) = p^*(S) = \frac{1}{3}

Any deviation from uniform random play can be exploited by an adaptive opponent.

6. Exploitability

If you play a predictable strategy $\sigma = (p_R, p_P, p_S)$ , the AI's best response yields expected payoff:

\max\{p_S - p_P, p_R - p_S, p_P - p_R\}

The more unbalanced your strategy, the higher the AI's advantage.

References:Wikipedia - Rock Paper Scissors,Multiplicative Weights

Luis Schmitz

Rock-Paper-Scissors vs Learning AI

Game Settings

Metrics

Make Your Move

Move History

AI Strategy Evolution

How It Works

Math & Strategy

1. Game Structure

Action Selection

2. Payoff Structure

3. Frequency-Based Learning

4. Multiplicative Weights Update (MWU)

Probability Calculation

5. Regret Minimization

Regret Bound

5. Nash Equilibrium

6. Exploitability