core / prompts.py

System-prompt templates and per-budget thinking instructions.

Templates

Constant	Use
`SYSTEM_PROMPT` (line 231)	Main identity, persona rules, anti-hallucination guard, tool-use protocol.
`SPECIALIST_SYSTEM_PROMPT` (line 274)	DBA / SWE specialist mode — strict observability ("print everything").
`QWEN_TOOL_PROMPT` (line 172)	Qwen-specific XML tool-call syntax with CDATA escape hatch for `<` / `>`.
`THINK_BUDGET_TIGHT` (line 116)	≤5 sentences; no brainstorming alternatives.
`THINK_BUDGET_EXTENDED` (line 130)	~15 sentences for debugging / SQL / algorithm work; forbid drafting code in the think block.
`THINK_BUDGET_SELFPLAY` (line 158)	≤6 bullet points for fast bounded self-play exercises.

Helpers

Function	Purpose
`build_project_briefing(store, project_id, max_events=3, max_open_tasks=8) -> str`	Render a multi-line project context block — open tasks, recent events, current goal — for prompt injection (line 4).

Tool-orchestration routing rules (SYSTEM_PROMPT)

A block of MANDATORY TRIGGERS in SYSTEM_PROMPT maps user intents to specific tool calls so the LLM doesn't improvise. Current rules include:

SLEEP / REST → dream_mode
SELF-PLAY (one-shot) → self_play — for "practise", "train", "do self-play" (single cycle).
SELF-PLAY (continuous) → self_play_loop — for "continuously", "in a loop", "back to back", "until I tell you to stop". stop_self_play on explicit user stop while running.
LESSONS SURFACE → list_lessons — for "what have you learned today/so far/this week", "show me your lessons", "show me the lesson playbook". Picks scope ∈ {today, week, all, self_play_only} from phrasing. A LESSON is a mistake-and-fix the agent has internalized — distinct from a SKILL (a tool).
SKILLS SURFACE → manage_skills(action="list") — for "show me your skills", "list your skills", "what skills do you have", "show me your custom skills". A SKILL is a TOOL or set of tools, NOT a lesson. The historical "skill playbook" wording was the conflation that misrouted skill queries to list_lessons; both the tool description and this rule now disambiguate.
KNOWLEDGE & RAG → recall first.
WEB FACTS → web_search first; fact_check / deep_research only for complex verification.

Self-play challenge-gen hardening (SYNTHETIC_CHALLENGE_PROMPT)

The challenge-generation system prompt carries explicit rules to prevent LLM-generated validators from being unwinnable:

Float formatting — always convert both sides to float() with tolerance before comparing; never compare round() output with f-string output directly.
Unit suffixes — if an expected field carries a %, $, ms, or comma-thousand separator, strip it before float()/int(). Includes a concrete WRONG/RIGHT example of the float('60.00%') bug.
One convention per field — either string equality with the suffix kept, or numeric tolerance with the suffix stripped. Never mix.
Mental self-test — the generator is told to trace the validator against a hypothetical solution.py that echoes its own expected output; every parse/comparison line must succeed.

When FrontierTracker reports all clusters saturated, the dream loop additionally injects a CURRICULUM DIVERSITY REQUIREMENT block telling the generator to pick from {concurrency, algo, regex_parse, sql, bash} and forbidding another data-analysis / CSV-groupby shape.