Monte Carlo · Legacy · 36 Decks · Pure Python

MTGSimClaude

AI-driven metagame simulation for Legacy. 787 decision branches across 19 strategy engines. Three production outputs. 2.5ms per game.

0
Decks
0
Games Simulated
0
Tests Pass
0
Lines of Code
Performance
Built for speed & scale
2.5ms per game. Full 36-deck matrix in 94 seconds. Zero external dependencies.

Flat win rate by deck

Top 8 and bottom 5 · 30 games/pair

Matchup spread — select a deck

WR against each opponent in heatmap

Win resolution

500 games across 10 matchup pairs

Avg game length by matchup type

Turns to completion · 30 games/pair
Architecture
Five layers, one pipeline
Symmetric engine — p1/p2 are neutral slots. Plugin deck architecture — zero engine edits to add a deck.
Output products
Meta matrix heatmap · deck guides · Bo3 replayer
3 HTML products
Simulation core
run_game, run_sweep, run_meta_matrix · 19 strategy functions
sim.py 91K + engine.py 249K
State & rules engine
GameState, Card, Permanent, ManaPool · Chalice, Trinisphere, Thalia
game.py 49K + rules.py 29K
Interaction AI
classify_threat · best_proactive_target · try_reactive_counter
interaction.py 249 ln
38 deck modules
22 full strategy + 17 proxy · auto-discovered by deck_registry.py
decks/ 594K total
AI engine
How the AI plays Magic
19 strategy functions, 787 decision branches, 73 card tags. Click each stage.
Output
Three production products
Standalone HTML. No server, no build step. Open in any browser.

Meta matrix

38×38 interactive heatmap. 5 data layers. Tier chips. Weighted WR.

Card statsS/A/B/C tiers
Open matrix →
71%

Deck guide

Kill-turn charts. Hand archetypes. Scryfall hovers. 7 visual components.

10K simsTournament prep
Open guide →
T3 Storm casts Dark RitualT3 Dimir counters with FoWT3 Storm: Flusterstorm backup

Bo3 replayer

17 play categories. 6 board zones. AI reasoning toggle. Life chart.

17 categoriesSeed control
Open replay →
Validation
Sim vs tournament consensus
7 matchups spot-checked against Bo3 matrix. 2 pass. 2 borderline. 3 traced to strategy bugs.

Matchup validation

Click any row for detail

8-deck heatmap

Click any cell for matchup breakdown
20% → 50% → 80%
Status
Development timeline
Fixed
Combo gate pattern: Oops +20pp, Reanimator +14pp, Doomsday +15pp
Combo decks cracking Petals/Guides unconditionally T1, wasting all resources.
Root cause: no mana simulation before committing. Fix: check if combo is achievable (4+ mana with pieces in hand) before cracking. One if-statement per deck. Oops 37→57%, Reanimator +14pp vs Dimir, Doomsday +15pp vs D&T.
Fixed
Eidolon: 0 opponent triggers → 1.0/game
Opponent spells bypassed cast_spell() pipeline, never firing Eidolon.
Post-strategy GY growth tracking catches ~50% of opponent spells. Full fix needs cast_spell() refactor (Brief C). Before: 262 triggers all on P1, 0 on P2. After: P1 1.7/game, P2 1.0/game.
Fixed
Rift Bolt suspend modeled (CR 702.62)
Was instant-cast; now pays R, exiles, resolves next upkeep with counter window.
Suspend: pay R, exile, resolve at next upkeep. Counter window on resolution (not on suspend). Hard cast 2R only when opp ≤3 life. Not a prowess trigger (special action). 65% of games suspend, 91% resolve, 7% countered.
Fixed
FoW pitch protection: never exile Oracle/DD/Tendrils
Doomsday was pitching Thassa's Oracle to Force of Will — exiling win condition.
_select_fow_pitch() now has never_exile set for all combo win conditions + combo pieces. Also checks win_condition and is_combo_piece Card attributes. Doomsday vs Burn still low (life payment) but no longer self-sabotages.
Fixed
Depths crash: missing import random
Crop Rotation path crashed on random.random() — strategy forfeited turns.
Added import random. Also removed artificial 60% success cap on Crop Rotation (it's a tutor — always finds). Depths +11.7pp overall, vs Burn 20→42%.
Fixed
Affinity Automaton double-count
Emry recursions boosted Automaton inline AND via end-of-strategy counter.
Removed inline +1 per Emry cast. End-of-strategy artifacts_cast_this_turn already handles it. Max Automaton was 66/66 (clearly broken). Small overall WR impact.
Fixed
Burn fetchland manabase + Fireblast 0→42%
Burn had 0 fetchlands and Mountains lacked subtypes. Fireblast never cast.
Added 6 Wooded Foothills, Mountains get subtypes={'Mountain'}. Fireblast condition: T4+ (was T6). Cast rate 0→42%. Multiple matchups shifted 10-15pp.
Fixed
Elves: Heritage Druid before Glimpse
Heritage deployed at Priority 4 (after Glimpse check at Priority 2). Chain never fired.
Now: Heritage + cheap elves deploy first, THEN Glimpse chain fires. Glimpse chains: barely firing → 85%. Natural Order kills: 25%. vs D&T +10pp.
Verified
Trinisphere pre-dispatch CMC pattern
max(cmc, 3) before strategy dispatch. All strategies auto-pay tax. CR 601.2f compliant.
In play_turn() and protagonist_turn(), all cheap spells get cmc raised to max(cmc, 3) before strategy dispatch. Every strategy automatically pays the Trinisphere tax. Also blocks FoW/FoN alternate costs. LED costs 3 under Trinisphere (artifact, CMC 0 → taxed to 3).
Verified
Symmetric counter logic
try_reactive_counter() works for either player slot.
Single function handles both directions. Scans defender hand for counter tags, classifies threat, checks Trinisphere/Veil, then walks FoN→FoW→CS→Fluster→Pyro→Daze. Hand-size gates: CS needs 4+ cards. Veil of Summer and Allosaurus Shepherd bypass entirely.
Verified
Plugin deck architecture
deck_registry.py auto-discovers modules. 36 decks, zero engine edits.
Each file in decks/ exports DECK_META. deck_registry.py scans at import. Adding a new deck = one .py file in decks/. No changes to engine.py, sim.py, or cards.py.
P1
Brief C: cast_spell() refactor
All strategies should use unified pipeline (Eidolon 50%→100% coverage).
Currently ~50% of opponent spells bypass cast_spell(), dodging Eidolon triggers and spell tracking. Refactoring all strategies to use the pipeline would give full Eidolon accuracy and enable future static effects.
P1
No static lock persistence
Karn lockout partially implemented. Chalice CMC blocking not modeled.
Karn blocks Petal mana and Vial activation (implemented). Chalice CMC blocking and Ensnaring Bridge attack prevention need persistent game state flags. Lock-based decks sim below real-world performance.
Metagame
38 tournament-sourced decks
MTGO Challenges, Showcases, League results. April 2026 meta shares.
Ecosystem
Two simulators, cross-pollinating
38

Legacy — MTGSimClaude

2.5ms/game. 19 per-deck strategy functions. Card-level interaction knowledge. Force of Will priority tables. 137/137 rules tests. Plugin deck architecture.

15

Modern — MTGSimManu

Full Bo3 with sideboarding. Clock-based EV scoring. Bayesian hand inference. Combat simulation. LLM-audited strategy. GoalEngine phase tracking.

Roadmap
Cowork briefs — next 5 sessions
Self-contained handoffs. 3 PRs merged (#83/#84/#87). 5 active briefs.
[A] Clock + BHI + Decisional
±3-5pp WR shifts · Parallel with B+D
Burn/UR go-face logic + Storm/Oops combo gates. Port clock.py (328 lines) + bhi.py (275 lines).
Burn: go-face when clock < opp clock
Storm/Oops: combo gates before kill attempt
Touches _strategy_* functions + config
[B] Karn lockout
Prison +5-10pp · Parallel with A+D
Add Karn static ability enforcement. New game state field.
karn_lockout flag in GameState
opp_can_cast() blocks artifact activation
Painter/Prison WR jumps expected
[D] LLM judge audit
30-50 traces · Parallel with A+B
Run grade_traces.py with 6-expert panel. Flags systematic strategy weaknesses.
Needs ANTHROPIC_API_KEY
Read-only on sim code
Output: audit report + domain grades
[C] Response fn unification
Asymmetry 7.8pp → ~5pp · After A merge
Extract _respond_on_active_turn. Follow-up to PR #84 (turn unification).
Unify counter responses into single fn
Depends on merged A
Solo cowork session
[E] Matrix n=500
σ ±3.9pp → ±2.5pp · Final step
Canonical reference dataset. Captures all A/B/C/D impact. Regenerates guides + HTML.
Run after A+B+C+D merged
~7 min runtime
Replaces current n=100 Bo3 matrix
LLM Audit
6-Domain AI Quality Grades
41 traces across 11 matchups graded on mulligan, mana, combat, combo, interaction, and meta domains. Threshold: B-.
Domain Radar
Domain Averages (N=41)
Domain Grade Status
PASS — all 6 domains at B- or better (re-graded 2026-05-04 after PRs #111-#117)
Flagged Weaknesses (2+ C/D/F grades)
41
Traces Graded
11
Matchups Covered
6
Grading Domains
PASS
Threshold Check (B-)