Money-Maker-Ready — 5-Agent Plan Comparison

Generated 2026-05-11 · Sources: Claude Code (this session), Cursor real_money_edge_plan_ed80c0d8.plan.md, GitHub Copilot, Kimi IDE, ChatGPT Codex · Skill: .claude/skills/money-maker-ready/SKILL.md v1.0 (shipped 2026-05-11)

The 5 plans

Plan	Author / Model	Page
Claude Code (Opus 4.7)	This session — Plan Mode output	view
Cursor	Cursor Plan Mode	view
GitHub Copilot	Copilot Chat (VS Code)	view
Kimi IDE	Moonshot Kimi (IDE agent)	view
ChatGPT Codex	OpenAI Codex	view

Headline verdicts at a glance

Plan	Headline verdict	Strongest class	Rollout aggressiveness	Real-money gate
Claude Code	NOT READY — 3 data-layer CRITICALs (n=0, walk-forward folds=0, dashboard 40.7h stale) block every downstream verdict	Unclear — agent's per-class baseline read returned n=0 across all classes	Block everything until §1/§2 fixed	Skill freshness PASS + populated payload + ≥1 fabrication-flag
Cursor	Fast-track EQUITY/COMMODITY/ETF; contain FOREX/CRYPTO/BOND	COMMODITY (PF 3.92), EQUITY (PF 1.60), ETF (PF 1.48, near-T2)	Moderate — 72h triage then phased scale	≥2 classes sustain Tier-2 on rolling window + drift cleared
GitHub Copilot	Refresh-and-lock first; ETF + COMMODITY rollout first, CRYPTO curated sleeve parallel	ETF (cleanest OOS), COMMODITY (PF edge)	Conservative pilot — 2 consecutive weekly Tier-2 passes	2 consecutive weekly Tier-2 snapshots per class
Kimi IDE	Start real-money pilot in EQUITY+COMMODITY NOW at 1% risk; FOREX hard-stop; quarantine 3 toxic systems	COMMODITY (PF 2.08), EQUITY (PF 1.42)	Aggressive — 1% risk pilot this week	7-check Go/No-Go gate (currently 0/7); items 1-4 are 1-hour fixes
ChatGPT Codex	Block all classes until all six clean (per user's stated "all classes first" preference); fix truth layer before edge work	EQUITY (strongest broad candidate)	Most conservative — no class until ALL six SHADOW-ready	Class state-machine: BLOCKED→REHAB→OOS_READY→SHADOW→LIVE_ELIGIBLE; 14-30d shadow per class

Per-class baseline reads — RERUN-TO-RERUN DISAGREEMENT

All 5 plans claim to source dashboard_data.json::performance.asset_class_health. The numbers diverge enough that at least one plan is mis-reading the JSON:

Class	Claude Code	Cursor	Kimi	Combined plan (peer 2026-05-11 20:00Z)
COMMODITY	n=0 / WR 67.2 / PF 3.97	n=408 / WR 67.4 / PF 3.92	n=816 / WR 48.7 / PF 2.08	n=408 / WR 67.4 / PF 3.92
EQUITY	n=0 / WR 53.7 / PF 1.58	n=443 / WR 54.0 / PF 1.60	n=428 / WR 52.8 / PF 1.42	n=443 / WR 54.0 / PF 1.60
CRYPTO	n=0 / WR 48.0 / PF 1.40	n=7875 / WR 47.4 / PF 1.39	n=8166 / WR 44.8 / PF 1.26	n=7875 / WR 47.4 / PF 1.39
FOREX	n=0 / WR 41.7 / PF 0.27	n=1825 / WR 41.8 / PF 0.28	n=1249 / WR 45.6 / PF 0.28	n=1825 / WR 41.8 / PF 0.28
ETF	—	n=100 / WR 60.0 / PF 1.48	n=88 / WR 53.4 / PF 1.20	n=100 / WR 60.0 / PF 1.48
BOND	n=0 / WR 54.5 / PF 0.66	n=11 / WR 54.5 / PF 0.66	n=18 / WR 55.6 / PF 1.72	n=11 / WR 54.5 / PF 0.66

Diagnosis

Cursor and the peer's "combined plan" page agree exactly — they were both reading the same payload at the same moment. Treat their numbers as the canonical baseline at 2026-05-11 ~19:00Z.
Kimi's numbers diverge meaningfully (n=816 vs 408 for COMMODITY; PF 2.08 vs 3.92). Two possibilities: (a) Kimi read a different field — maybe by_asset_class.raw instead of performance.asset_class_health; (b) Kimi read at a different moment between two refresh cycles. Note Kimi's CRYPTO n=8166 is higher than Cursor's 7875, consistent with a later read.
Claude Code's n=0 is a clear read-error. The Explore agent likely returned asset_class_health[CLASS].closed_picks=0 from the wrong sub-block (possibly walkforward.by_class[CLASS].n), then misreported it as the verdict-grade n. This invalidates Claude Code's §1 finding ("STRUCTURAL BUG: all n=0") but does NOT invalidate the rest of its plan — the dragger list, baby_strats overfit flag, drift state, and BLOCKED list state were all confirmed by the second Explore agent against the source-of-truth files.

Correction

The §1 baseline in the Claude Code plan should be replaced with the Cursor / combined-plan numbers. The downstream P0 "fix asset_class_health.n=0 bug" action is not real — the bug is in the agent's read, not the data. Replace P0 #1 + P0 #2 with: verify Cursor / Kimi divergence by running both reads side-by-side; if Cursor is canonical, drop the n=0 P0s.

Convergence (where all 5 plans agree)

FOREX is broken. PF 0.27-0.28 across all reads. Every plan either kills, hard-caps, or rehabs-before-kill. No plan proposes scaling FOREX.
Drift alert is a blocker. All 5 cite hf_stats.concept_drift.drift_alert=true as a hard gate. Codex makes it the #1 blocker ("fix truth layer first").
baby_strats family is overfitted. 12 divergence rows. Claude Code, Cursor, Copilot, Kimi all flag for surgical quarantine. Kimi names the variants (crypto_soc_*).
kimi_signal_tracking is a dragger. -954% PnL / PF 0.26. Named in Claude Code, Kimi, Codex. Cursor + Copilot wrap into broader "dragger quarantine" without name.
EQUITY + COMMODITY are the two closest-to-ready classes. 4/5 plans say so (Codex defers but acknowledges EQUITY strongest qualitatively).
ETF needs sample expansion. 4/5 plans want n≥100→150-200 before promotion.
BOND is too thin. All 5 plans keep BOND paper-only.
Walk-forward coverage is missing for COMMODITY (and BOND). All plans that touched walk-forward note this gap.
No live capital today. Even Kimi's "aggressive" plan gates the pilot on 4 fixes-of-the-day clearing first.

Divergence (where the plans split)

Axis 1 — Rollout aggressiveness

Most aggressive	→	Most conservative
Kimi (1% risk pilot THIS WEEK on EQUITY+COMMODITY)	Cursor (Phase 1 triage 72h, then phased scale) ≈ Copilot (2-week conservative gate)	Codex (block ALL classes until all six SHADOW-ready)

Codex notes the user "chose all classes first" — that puts the conservative end as the user's stated preference.

Axis 2 — What to fix first

Codex: Truth layer (db_health.json red, drift alert, class tagging) — explicitly "data trust before alpha"
Cursor: Add walk-forward for COMMODITY + BOND to alpha_engine/walkforward_validator.py; capital-gate scaffold
Copilot: Refresh-and-lock source of truth (freshness preflight); then drift + dragger
Kimi: Three specific quarantines (kimi_signal_tracking, crypto_soc_*, FOREX) — 1-hour fixes
Claude Code: §1/§2 data-layer bugs (now known to be an agent read-error — see correction above)

Axis 3 — Real-money gate design

Codex: Class state machine BLOCKED→REHAB→OOS_READY→SHADOW→LIVE_ELIGIBLE with 14-30d shadow per class. Most formal.
Copilot: 2 consecutive weekly Tier-2 passes. Simple and verifiable.
Kimi: 7-check Go/No-Go checklist. Most operational.
Cursor: ≥2 classes sustain Tier-2 + drift cleared. Most permissive.
Claude Code: Skill freshness PASS + ≥1 fabrication-flag + populated payload. Skill-driven.

Axis 4 — Net-new infrastructure proposed

Codex: Dashboard payload contract extension (readiness.by_class, leaders.by_class, draggers.by_class, capped_vs_raw_pnl_gap, single_symbol_concentration). Biggest payload-schema change.
Cursor: Add /audit DB-lineage telemetry card; backtests-vs-live consistency check.
Kimi: Per-symbol PF drill-down in audit UI; wire real CFTC COT data.
Claude Code: Riskfolio-Lib sidecar, FRED wire-up, Kalshi pairwise. Most external-data heavy.
Copilot: Phase-3 Go/No-Go scoreboard with 6 explicit gates per class.

Recommended convergence plan (synthesis)

Adopt user-stated preference (Codex's "all classes first") as the rollout posture. Mix in Kimi's 1-hour fixes for immediate P0 action and Codex's payload-contract extension for the structural fix. Use Cursor's measurable success criteria as the gate definitions.

P0 (next 24h — fastest fixes with biggest impact)

Blacklist kimi_signal_tracking via alpha_engine/config.py:216 BLACKLISTED_STRATEGIES (Kimi). Memory feedback_gate_at_execution_not_generation: verify enforcement at exec gate, not just intake.
Surgically quarantine baby_strats:crypto_soc_* family via per-strategy BLOCKED_ASSET_STRATEGY_PAIRS at audit_trail/quality_gates.py:1499 (Claude Code + Kimi + Cursor; existing proposal at reports/baby_strats_overfit_quarantine_proposal_2026_05_10.md).
Hard-cap FOREX sizing at 0 until PF ≥ 0.8 — explicit per-class gate, not silent kill (Kimi + Cursor; respect mutate-before-kill protocol from docs/MUTATION_THREE_AXIS_PROTOCOL.md).
Verify max-drawdown calculation uses capped PnL (Kimi flagged 680% MDD anomaly).
Verify multi_asset_cot PF=19.19 via DB query against ejaguiar1_stocks (Claude Code) — data integrity smoke test.
Resolve claude_gainer_st winner-vs-blacklist contradiction (Claude Code; system shows PF 6.12 / n=3472 in systems yet appears in BLACKLISTED_STRATEGIES at alpha_engine/config.py:216).
Add walk-forward coverage for COMMODITY + BOND in alpha_engine/walkforward_validator.py (Cursor); surface in audit_trail/dashboard_generator.py.

P1 (week 1 — structural)

Implement Codex's readiness.by_class payload block (class state-machine fields: stage, blockers, n_cumulative, oos_sharpe, oos_consistency, system_concentration, symbol_concentration, data_trust_ok).
Drift detector — fix hf_stats.concept_drift.KS_D uncomputed-zero bug + refresh 19-day stale hf_stats. Wire drift→auto-pause sizing when D > 0.10.
Reconcile /audit threshold text with docs/PERFORMANCE_CHARTER.md v1.0 (Codex).
Add last_signal_date to systems payload (Claude Code) — currently absent for all top-6 winners.

P2 (week 2-4 — class rehabs in parallel)

EQUITY: bottom-symbol pruning, High-Conviction parity, earnings-drift + sector RS features (Codex).
ETF: universe expansion to n≥100; block leveraged ETFs; sector-theme caps (Codex).
CRYPTO: after dragger removal, score by sleeve/subsystem; add funding/basis/OI/on-chain features (Codex + Kimi).
FOREX: mutate-before-kill rehab with session/macro filters; wire COT/DXY-beta/carry/news-blackout (Codex + Cursor).
COMMODITY: disclose CT=F/KC=F concentration; term-structure + COT + seasonality features (Codex).
BOND: universe expansion + duration filters; paper-only until n ≥ 100 (all plans).

Real-money gate (consensus)

Adopt the strictest of the 5: Codex's all-classes-first state machine. No class receives live capital until ALL six major classes (CRYPTO/EQUITY/ETF/FOREX/COMMODITY/BOND) reach SHADOW state for 14 consecutive days, AND DB-health is green on all 6 sub-checks, AND drift_alert is false. This honors the user's stated "all classes first" preference and avoids per-class promotion races that the other 4 plans implicitly invite.

Plan-level lessons learned

Multi-agent surfaced a data-integrity bug in single-agent reads. Claude Code's n=0 finding would have led to a wrong P0 cluster if shipped alone. Cross-check against ≥1 peer plan before treating any anomaly as a code bug.
Conservative-vs-aggressive gradient maps cleanly to capability tier. Lower-cost / faster engines (Kimi) push aggressive pilots; higher-context / formal engines (Codex) push conservative state-machine gates. The user's "all classes first" preference resolves the tradeoff.
Skill output converges on action list quality. 4/5 plans produced an actionable BLOCK / quarantine / rehab list. The 5th (Copilot) was meta-process heavy but still landed on the same convergence points.
The money-maker-ready skill works. All 5 plans cite the same canonical inputs (dashboard_data.json::performance.asset_class_health, walkforward.by_class, fwd_vs_bt_divergence.rows, hf_stats.concept_drift) and reach overlapping verdicts. The skill is succeeding at its stated job: making `/audit` real-money-readiness analyzable.

Files cited across all 5 plans

audit_dashboard/data/dashboard_data.json (canonical payload — all 5)
audit_trail/dashboard_generator.py (writer — Cursor, Copilot, Claude Code)
audit_trail/quality_gates.py:1499 BLOCKED_ASSET_STRATEGY_PAIRS (Claude Code, Cursor, Kimi)
alpha_engine/config.py:216 BLACKLISTED_STRATEGIES (Claude Code, Kimi)
alpha_engine/outcome_resolver.py (Copilot, Codex)
alpha_engine/real_money_tracker.py (Codex — proposes retire)
alpha_engine/walkforward_validator.py (Cursor)
audit_dashboard/template.html (Claude Code, Codex)
audit_trail/mysql_client.py (Cursor, Copilot)
.github/workflows/audit-dashboard.yml (Cursor, Copilot)
docs/PERFORMANCE_CHARTER.md (all 5)
docs/STRATEGY_INVESTIGATION_BEFORE_KILL.md (Claude Code, Cursor)
docs/MUTATION_THREE_AXIS_PROTOCOL.md (Claude Code, Cursor)
.claude/skills/money-maker-ready/SKILL.md (all 5)
reports/baby_strats_overfit_quarantine_proposal_2026_05_10.md (Claude Code, referenced by 3 others implicitly)

Next step

User to pick a posture:

Option A (aggressive — Kimi): ship the 7-step Go/No-Go pilot this week; small-size live on EQUITY+COMMODITY only.
Option B (moderate — Cursor/Copilot consensus): 72h triage → 14d-rolling Tier-2 confirmation → ≥2-class promotion gate.
Option C (conservative — Codex + user-stated preference): all-classes-first state machine; no class until all six are SHADOW.
Option D (skill-driven — Claude Code): rerun the money-maker-ready skill weekly; gate promotion on ≥1 fabrication-flag-clean + populated payload + all 11 sections green.

Recommendation: Option C as the strategic posture (matches stated preference + Codex framework) + Option A's P0 cluster as the immediate tactical actions (1-hour fixes are cheap insurance regardless of strategy).