Forge Operational Forensics

April 9, 2026 — Corrected edition (smoke-tested against actual config files)

1. What We Found

FindingSeverityImpact
5 dead LiteLLM model aliasesCRITICALclaude-haiku, claude-sonnet, llama-3.2-3b, qwen-coder-32b, gemini-flash-lite silently failing
Gatekeeper 285 silent failuresCRITICAL200 OK but fallback every time. All tasks auto-approved.
Guardian reverting fixesHIGHDirect edits reverted by auto-commit. deploy.sh only durable path.
9:1 planning-to-execution ratioHIGH~69 directories, ~6 with running code
Roadmap false COMPLETE claimsCRITICALPhase 4.6 perceive marked COMPLETE, not on main. Merged today.

2. What Was Fixed

FixActual Change (verified against litellm.yaml)
claude-haikugroq/llama-3.1-8b-instant (Anthropic credits depleted)
claude-sonnetgemini/gemini-2.5-flash (Anthropic credits depleted)
llama-3.2-3bgroq/llama-3.1-8b-instant (Ollama model not loaded)
qwen-coder-32bopenrouter/deepseek/deepseek-chat-v3-0324 (Qwen retired)
gemini-flash-litegemini/gemini-2.5-flash-lite (2.0 deprecated Apr 2026)

All deployed via deploy.sh (commit 2617f1d25). Docker restarted. All 5 verified with live API calls.

Also fixed: Gatekeeper model (groq-llama-8b), heartbeat health probe (parses bodies), model-lint.sh, wiring-probe.sh (10/10 paths), perceive merge, stale run cleanup, /align Lens 6.


3. Meta-Lessons

Fix the alias, not the scripts

One litellm.yaml change + docker restart fixes ALL scripts. Editing 8 files individually gets reverted by guardian. The abstraction layer is the fix.

200 OK is not healthy

Gatekeeper returned 200 with fallbacks: 285. Binary monitoring said healthy. Functional monitoring (parsing response bodies) catches this.

Vision-rewarding bias = hollowware

Rubric scored unbuilt platform 22-24/25. Added Lens 6: Execution Reality. Hard cap: unbuilt = max 3/5, broken prereqs = max 2/5. Rubric now /30.


4. Honest State

Theory
5.0
Tactical
3.5
Potential
4.5
Reality
2.7
Gap
2.3

13/13 core services healthy. 21/21 LiteLLM models live. 10/10 wiring paths connected. But: 6 dashboard endpoints missing, roadmap 45 days stale, Phase 1 blocked.


5. Agent Clubhouse Findings

Agent LevelShare?Why
Strategy (CEO, board)ISOLATEDContext-specific judgment dilutes
Skill (writer, coder)SHARED OKDiversity enriches
Infrastructure (ops)SHAREDUniversal patterns

Forge is an agent ON the platform, not the platform itself. Build prerequisites: Ralph clean 7 days, Loom Phase 1 operational, one non-Ralph agent proven useful.


6. Next Priorities

1 Human Verification Loop (test guides + verify page) — closes "Ralph says done, nobody checks"

2 6 missing observe endpoints — completes dashboard monitoring

3 MACRO-ROADMAP.md honest rewrite — stops false COMPLETE claims