MTGSimClaude

Performance

Built for speed & scale

2.5ms per game. Full 36-deck matrix in 94 seconds. Zero external dependencies.

Flat win rate by deck

Top 8 and bottom 5 · 30 games/pair

Matchup spread — select a deck

WR against each opponent in heatmap

Win resolution

500 games across 10 matchup pairs

Avg game length by matchup type

Turns to completion · 30 games/pair

AI engine

How the AI plays Magic

19 strategy functions, 787 decision branches, 73 card tags. Click each stage.

Validation

Sim vs tournament consensus

7 matchups spot-checked against Bo3 matrix. 2 pass. 2 borderline. 3 traced to strategy bugs.

Matchup validation

Click any row for detail

8-deck heatmap

Click any cell for matchup breakdown

20% → 50% → 80%

Status

Development timeline

Fixed

Combo gate pattern: Oops +20pp, Reanimator +14pp, Doomsday +15pp

Combo decks cracking Petals/Guides unconditionally T1, wasting all resources.

Root cause: no mana simulation before committing. Fix: check if combo is achievable (4+ mana with pieces in hand) before cracking. One if-statement per deck. Oops 37→57%, Reanimator +14pp vs Dimir, Doomsday +15pp vs D&T.

Fixed

Eidolon: 0 opponent triggers → 1.0/game

Opponent spells bypassed cast_spell() pipeline, never firing Eidolon.

Post-strategy GY growth tracking catches ~50% of opponent spells. Full fix needs cast_spell() refactor (Brief C). Before: 262 triggers all on P1, 0 on P2. After: P1 1.7/game, P2 1.0/game.

Fixed

Rift Bolt suspend modeled (CR 702.62)

Was instant-cast; now pays R, exiles, resolves next upkeep with counter window.

Suspend: pay R, exile, resolve at next upkeep. Counter window on resolution (not on suspend). Hard cast 2R only when opp ≤3 life. Not a prowess trigger (special action). 65% of games suspend, 91% resolve, 7% countered.

Fixed

FoW pitch protection: never exile Oracle/DD/Tendrils

Doomsday was pitching Thassa's Oracle to Force of Will — exiling win condition.

_select_fow_pitch() now has never_exile set for all combo win conditions + combo pieces. Also checks win_condition and is_combo_piece Card attributes. Doomsday vs Burn still low (life payment) but no longer self-sabotages.

Fixed

Depths crash: missing import random

Crop Rotation path crashed on random.random() — strategy forfeited turns.

Added import random. Also removed artificial 60% success cap on Crop Rotation (it's a tutor — always finds). Depths +11.7pp overall, vs Burn 20→42%.

Fixed

Affinity Automaton double-count

Emry recursions boosted Automaton inline AND via end-of-strategy counter.

Removed inline +1 per Emry cast. End-of-strategy artifacts_cast_this_turn already handles it. Max Automaton was 66/66 (clearly broken). Small overall WR impact.

Fixed

Burn fetchland manabase + Fireblast 0→42%

Burn had 0 fetchlands and Mountains lacked subtypes. Fireblast never cast.

Added 6 Wooded Foothills, Mountains get subtypes={'Mountain'}. Fireblast condition: T4+ (was T6). Cast rate 0→42%. Multiple matchups shifted 10-15pp.

Fixed

Elves: Heritage Druid before Glimpse

Heritage deployed at Priority 4 (after Glimpse check at Priority 2). Chain never fired.

Now: Heritage + cheap elves deploy first, THEN Glimpse chain fires. Glimpse chains: barely firing → 85%. Natural Order kills: 25%. vs D&T +10pp.

Verified

Trinisphere pre-dispatch CMC pattern

max(cmc, 3) before strategy dispatch. All strategies auto-pay tax. CR 601.2f compliant.

In play_turn() and protagonist_turn(), all cheap spells get cmc raised to max(cmc, 3) before strategy dispatch. Every strategy automatically pays the Trinisphere tax. Also blocks FoW/FoN alternate costs. LED costs 3 under Trinisphere (artifact, CMC 0 → taxed to 3).

Verified

Symmetric counter logic

try_reactive_counter() works for either player slot.

Single function handles both directions. Scans defender hand for counter tags, classifies threat, checks Trinisphere/Veil, then walks FoN→FoW→CS→Fluster→Pyro→Daze. Hand-size gates: CS needs 4+ cards. Veil of Summer and Allosaurus Shepherd bypass entirely.

Verified

Plugin deck architecture

deck_registry.py auto-discovers modules. 36 decks, zero engine edits.

Each file in decks/ exports DECK_META. deck_registry.py scans at import. Adding a new deck = one .py file in decks/. No changes to engine.py, sim.py, or cards.py.

P1

Brief C: cast_spell() refactor

All strategies should use unified pipeline (Eidolon 50%→100% coverage).

Currently ~50% of opponent spells bypass cast_spell(), dodging Eidolon triggers and spell tracking. Refactoring all strategies to use the pipeline would give full Eidolon accuracy and enable future static effects.

P1

No static lock persistence

Karn lockout partially implemented. Chalice CMC blocking not modeled.

Karn blocks Petal mana and Vial activation (implemented). Chalice CMC blocking and Ensnaring Bridge attack prevention need persistent game state flags. Lock-based decks sim below real-world performance.

Ecosystem

Two simulators, cross-pollinating

38

Legacy — MTGSimClaude

2.5ms/game. 19 per-deck strategy functions. Card-level interaction knowledge. Force of Will priority tables. 137/137 rules tests. Plugin deck architecture.

15

Modern — MTGSimManu

Full Bo3 with sideboarding. Clock-based EV scoring. Bayesian hand inference. Combat simulation. LLM-audited strategy. GoalEngine phase tracking.

Roadmap

Cowork briefs — next 5 sessions

Self-contained handoffs. 3 PRs merged (#83/#84/#87). 5 active briefs.

[A] Clock + BHI + Decisional

±3-5pp WR shifts · Parallel with B+D

Burn/UR go-face logic + Storm/Oops combo gates. Port clock.py (328 lines) + bhi.py (275 lines).

Burn: go-face when clock < opp clock
Storm/Oops: combo gates before kill attempt
Touches _strategy_* functions + config

[B] Karn lockout

Prison +5-10pp · Parallel with A+D

Add Karn static ability enforcement. New game state field.

karn_lockout flag in GameState
opp_can_cast() blocks artifact activation
Painter/Prison WR jumps expected

[D] LLM judge audit

30-50 traces · Parallel with A+B

Run grade_traces.py with 6-expert panel. Flags systematic strategy weaknesses.

Needs ANTHROPIC_API_KEY
Read-only on sim code
Output: audit report + domain grades

[C] Response fn unification

Asymmetry 7.8pp → ~5pp · After A merge

Extract _respond_on_active_turn. Follow-up to PR #84 (turn unification).

Unify counter responses into single fn
Depends on merged A
Solo cowork session

[E] Matrix n=500

σ ±3.9pp → ±2.5pp · Final step

Canonical reference dataset. Captures all A/B/C/D impact. Regenerates guides + HTML.

Run after A+B+C+D merged
~7 min runtime
Replaces current n=100 Bo3 matrix

LLM Audit

6-Domain AI Quality Grades

41 traces across 11 matchups graded on mulligan, mana, combat, combo, interaction, and meta domains. Threshold: B-.

Domain Radar

Domain Averages (N=41)

Domain	Grade	Status

PASS — all 6 domains at B- or better (re-graded 2026-05-04 after PRs #111-#117)

Flagged Weaknesses (2+ C/D/F grades)

41

Traces Graded

11

Matchups Covered

6

Grading Domains

PASS

Threshold Check (B-)

Flat win rate by deck

Matchup spread — select a deck

Win resolution

Avg game length by matchup type

Meta matrix

Deck guide

Bo3 replayer

Matchup validation

8-deck heatmap

Legacy — MTGSimClaude

Modern — MTGSimManu