core / agent.py
The reasoning loop. Drives streaming chat completion, parses tool-call XML, dispatches tools, and re-injects results.
Purpose
GhostAgent is the orchestrator that the FastAPI route hands a chat payload to. It owns the turn loop: it asks the upstream LLM for a streaming completion, watches the stream for <tool_call> XML blocks, parses arguments, runs the tool against the registry, captures output, and folds the result back into the conversation as a tool message. Sampling temperature, top-p, and the size of the "thinking" budget are all derived dynamically from a query classifier.
Public surface (selected)
| Symbol | Signature / type | Purpose |
|---|---|---|
GhostAgent.__init__ | (context: GhostContext) | Captures all subsystems via context: LLM client, sandbox, memory bus, planner, MCTS, verifier, uncertainty tracker, scratchpad, scheduler, project store. |
GhostAgent.handle_chat | async (body: dict, background_tasks, request_id: str) | Entry point invoked by /api/chat. Returns a StreamingResponse generator (when stream=true) or a JSON body. Owns the turn loop. |
_classify_coding_task | (query: str) -> "creative" | "precise" | "balanced" | Keyword-based classifier (line 84) that picks the sampling profile. |
classify_thinking_budget | (query, has_coding_intent, is_meta_task, in_active_project) -> "tight" | "extended" | "selfplay" | Selects which think-block budget prompt to inject (line 155). |
get_sampling_params | (is_tool_turn, query, is_coding) -> dict | Returns temperature / top_p / top_k / presence_penalty appropriate for the situation (line 210). |
_find_substantive_tool_for_verifier | (tools_run) -> Optional[dict] | Skips bookkeeping tools (scratchpad, manage_projects, …) when picking evidence to hand the verifier. |
Sampling profiles
| Profile | Temperature | top_p | top_k / extras | When |
|---|---|---|---|---|
CODING_SAMPLING_PARAMS | 0.6 | 0.95 | top_k=20 | "precise" coding turns; tool turns; SQL/regex/algorithm intents. |
GENERAL_SAMPLING_PARAMS | 1.0 | — | presence_penalty=1.5 | Free-form / creative turns. |
Thinking budgets
Set via core/prompts.py and selected per turn:
- tight — ≤5 sentences, no brainstorming. Default for short answers.
- extended — ~15 sentences for debugging / SQL / algorithm work. Forbids drafting code inside the think block.
- selfplay — ≤6 bullet points; cheap fast loop used in dream / self-play synthetic exercises.
The _EXTENDED_THINK_KEYWORDS list (agent.py:103) decides which user queries trigger the extended budget.
Constants
{"manage_projects","scratchpad",…} excluded from the verifier evidence search.Turn flow
Figure 2 — Per-turn dispatch within GhostAgent.handle_chat.
Concurrency
Single-task asyncio. The streaming generator is consumed inside the FastAPI request task; tool execution is awaited inline (the tool itself may use threads / subprocess). The agent does not spawn parallel turns — concurrency lives in MemoryBus (parallel hydration) and LLMClient (parallel pool dispatch).
Context compaction (per-turn payload)
Before every upstream LLM call, handle_chat assembles a transient block (tool header + persona + memory hydration + dynamic state) that rides at the top of the last user message. Two predicates decide how aggressively to trim that block before serialisation:
| Predicate | Trigger | What it suppresses | Per-turn saving |
|---|---|---|---|
_is_final_generation_for_schema |
force_final_response set, or planner returned required_tool=="none" / next_action_id=="none". |
Entire QWEN_TOOL_PROMPT + XML schema; replaced with a slim "Final-generation turn — DO NOT emit any <tool_call>" stanza that preserves the think-budget guidance. The native payload["tools"] array is also dropped — sending tools when the model is told to answer tempts it to call one instead. |
~8,700 tokens (XML mode) / ~860 tokens (already-native mode, since the schema is already gone). |
_native_tools_active |
args.native_tools=True (the default for Qwen 3.6 35B-A3 and newer). |
The XML <tool_def> blocks inside the prompt's {tool_schemas} slot — schemas arrive via the OpenAI-style payload["tools"] channel instead. The XML format scaffolding (parsing rules, parallel-call guidance, CDATA hint) is preserved so models that still emit the legacy <tool_call> shape are parsed correctly as a fallback. |
~7,800 tokens. |
The hoisted predicate runs before the dynamic-state block that also sets force_final_response=True, so it mirrors that line's next_action_id check directly to stay in sync. The canonical is_final_generation assignment further down the loop is unchanged — the schema-skip is a strictly upstream short-circuit. Verified end-to-end by tests/test_context_compaction.py (9 cases pinning XML-only path, native-only path, savings floor, planner-driven final-gen, and the cross-cutting native-suppression on final turns).
Self-play control hooks
handle_chat carries two small but load-bearing hooks that coordinate with the self-play subsystem:
- Continuous-loop interrupt — at the top of every turn, if
context.selfplay_loop_taskis active, the incoming user message flipscontext.selfplay_loop_stop. The background loop checks the event at each cycle boundary and during cool-off, so the loop pauses cleanly before answering the user. - Perfect-It skip during simulation — the Perfect-It follow-up LLM call (which writes optimisation suggestions into SkillMemory) is gated on
getattr(ctx.skill_memory, "is_read_only", False) is True. During self-play the isolated context'sReadOnlySkillMemoryclass carries that marker; the block is skipped entirely so the ~15 s follow-up LLM call and its misleading "Saved optimization strategy to playbook" log don't fire on no-op writes.
Checklist nudge & read-only tool exemption
The agent's meta-task compliance nudge ("you mentioned learning but didn't call learn_skill") is intentionally suppressed when the turn's tools include a read-only surface tool — list_lessons, recall, manage_skills. Otherwise a user query like "what have you learned today?" would loop for 5+ turns as the nudge pushes the agent to write a redundant skill it just deduplicates. The exemption keeps "surface a lesson" distinct from "persist a new one".
Validator crash detector
When the self-play solver gets a validator rejection, handle_chat's attempt-loop runs a circuit-breaker check for validator-frame crashes — tracebacks whose innermost frame is .validator.py and that don't mention solution.py. Classified as crashes: SyntaxError, IndentationError, ImportError, ModuleNotFoundError, NameError, ValueError, TypeError, KeyError, IndexError, AttributeError. Matched crashes abort the cycle after attempt 1 instead of burning the remaining attempts on an unwinnable challenge. See dream_cycle for the upstream gates that catch most of these earlier.
Cross-module dependencies
- core.planning —
TaskTreefor hierarchical decomposition. - core.prompts —
SYSTEM_PROMPTand per-budget templates. - core.bus — memory hydration.
- core.context_manager — token-budget compression.
- core.llm — upstream calls.
- tools.registry — tool discovery + dispatch.
- core.verifier & core.uncertainty — adjudication and metacognition.