core / prompts.py
System-prompt templates and per-budget thinking instructions.
Templates
| Constant | Use |
|---|---|
SYSTEM_PROMPT (line 231) | Main identity, persona rules, anti-hallucination guard, tool-use protocol. |
SPECIALIST_SYSTEM_PROMPT (line 274) | DBA / SWE specialist mode — strict observability ("print everything"). |
QWEN_TOOL_PROMPT (line 172) | Qwen-specific XML tool-call syntax with CDATA escape hatch for < / >. |
THINK_BUDGET_TIGHT (line 116) | ≤5 sentences; no brainstorming alternatives. |
THINK_BUDGET_EXTENDED (line 130) | ~15 sentences for debugging / SQL / algorithm work; forbid drafting code in the think block. |
THINK_BUDGET_SELFPLAY (line 158) | ≤6 bullet points for fast bounded self-play exercises. |
Helpers
| Function | Purpose |
|---|---|
build_project_briefing(store, project_id, max_events=3, max_open_tasks=8) -> str | Render a multi-line project context block — open tasks, recent events, current goal — for prompt injection (line 4). |
Tool-orchestration routing rules (SYSTEM_PROMPT)
A block of MANDATORY TRIGGERS in SYSTEM_PROMPT maps user intents to specific tool calls so the LLM doesn't improvise. Current rules include:
SLEEP / REST→dream_modeSELF-PLAY (one-shot)→self_play— for "practise", "train", "do self-play" (single cycle).SELF-PLAY (continuous)→self_play_loop— for "continuously", "in a loop", "back to back", "until I tell you to stop".stop_self_playon explicit user stop while running.LESSONS SURFACE→list_lessons— for "what have you learned today/so far/this week", "show me your lessons", "show me the lesson playbook". Picksscope∈{today, week, all, self_play_only}from phrasing. A LESSON is a mistake-and-fix the agent has internalized — distinct from a SKILL (a tool).SKILLS SURFACE→manage_skills(action="list")— for "show me your skills", "list your skills", "what skills do you have", "show me your custom skills". A SKILL is a TOOL or set of tools, NOT a lesson. The historical "skill playbook" wording was the conflation that misrouted skill queries tolist_lessons; both the tool description and this rule now disambiguate.KNOWLEDGE & RAG→recallfirst.WEB FACTS→web_searchfirst;fact_check/deep_researchonly for complex verification.
Self-play challenge-gen hardening (SYNTHETIC_CHALLENGE_PROMPT)
The challenge-generation system prompt carries explicit rules to prevent LLM-generated validators from being unwinnable:
- Float formatting — always convert both sides to
float()with tolerance before comparing; never compareround()output with f-string output directly. - Unit suffixes — if an expected field carries a
%,$,ms, or comma-thousand separator, strip it beforefloat()/int(). Includes a concrete WRONG/RIGHT example of thefloat('60.00%')bug. - One convention per field — either string equality with the suffix kept, or numeric tolerance with the suffix stripped. Never mix.
- Mental self-test — the generator is told to trace the validator against a hypothetical
solution.pythat echoes its own expected output; every parse/comparison line must succeed.
When FrontierTracker reports all clusters saturated, the dream loop additionally injects a CURRICULUM DIVERSITY REQUIREMENT block telling the generator to pick from {concurrency, algo, regex_parse, sql, bash} and forbidding another data-analysis / CSV-groupby shape.