core / agent.py

The reasoning loop. Drives streaming chat completion, parses tool-call XML, dispatches tools, and re-injects results.

Purpose

GhostAgent is the orchestrator that the FastAPI route hands a chat payload to. It owns the turn loop: it asks the upstream LLM for a streaming completion, watches the stream for <tool_call> XML blocks, parses arguments, runs the tool against the registry, captures output, and folds the result back into the conversation as a tool message. Sampling temperature, top-p, and the size of the "thinking" budget are all derived dynamically from a query classifier.

Public surface (selected)

Symbol	Signature / type	Purpose
`GhostAgent.__init__`	`(context: GhostContext)`	Captures all subsystems via `context`: LLM client, sandbox, memory bus, planner, MCTS, verifier, uncertainty tracker, scratchpad, scheduler, project store.
`GhostAgent.handle_chat`	`async (body: dict, background_tasks, request_id: str)`	Entry point invoked by `/api/chat`. Returns a `StreamingResponse` generator (when `stream=true`) or a JSON body. Owns the turn loop.
`_classify_coding_task`	`(query: str) -> "creative" \| "precise" \| "balanced"`	Keyword-based classifier (line 84) that picks the sampling profile.
`classify_thinking_budget`	`(query, has_coding_intent, is_meta_task, in_active_project) -> "tight" \| "extended" \| "selfplay"`	Selects which think-block budget prompt to inject (line 155).
`get_sampling_params`	`(is_tool_turn, query, is_coding) -> dict`	Returns temperature / top_p / top_k / presence_penalty appropriate for the situation (line 210).
`_find_substantive_tool_for_verifier`	`(tools_run) -> Optional[dict]`	Skips bookkeeping tools (`scratchpad`, `manage_projects`, …) when picking evidence to hand the verifier.

Sampling profiles

Profile	Temperature	top_p	top_k / extras	When
`CODING_SAMPLING_PARAMS`	0.6	0.95	top_k=20	"precise" coding turns; tool turns; SQL/regex/algorithm intents.
`GENERAL_SAMPLING_PARAMS`	1.0	—	presence_penalty=1.5	Free-form / creative turns.

Thinking budgets

Set via core/prompts.py and selected per turn:

tight — ≤5 sentences, no brainstorming. Default for short answers.
extended — ~15 sentences for debugging / SQL / algorithm work. Forbids drafting code inside the think block.
selfplay — ≤6 bullet points; cheap fast loop used in dream / self-play synthetic exercises.

The _EXTENDED_THINK_KEYWORDS list (agent.py:103) decides which user queries trigger the extended budget.

Constants

MAX_THINKING_CHARS

32 000 (line 242)

MAX_THINKING_CHARS_EXTENDED

64 000 (line 243)

DEFAULT_TOOL_TURN_MAX_TOKENS

16 384 (line 262)

_BOOKKEEPING_TOOL_NAMES

{"manage_projects","scratchpad",…} excluded from the verifier evidence search.

Turn flow

Figure 2 — Per-turn dispatch within GhostAgent.handle_chat.

Concurrency

Single-task asyncio. The streaming generator is consumed inside the FastAPI request task; tool execution is awaited inline (the tool itself may use threads / subprocess). The agent does not spawn parallel turns — concurrency lives in MemoryBus (parallel hydration) and LLMClient (parallel pool dispatch).

Context compaction (per-turn payload)

Before every upstream LLM call, handle_chat assembles a transient block (tool header + persona + memory hydration + dynamic state) that rides at the top of the last user message. Two predicates decide how aggressively to trim that block before serialisation:

Predicate	Trigger	What it suppresses	Per-turn saving
`_is_final_generation_for_schema`	`force_final_response` set, or planner returned `required_tool=="none"` / `next_action_id=="none"`.	Entire `QWEN_TOOL_PROMPT` + XML schema; replaced with a slim "Final-generation turn — DO NOT emit any <tool_call>" stanza that preserves the think-budget guidance. The native `payload["tools"]` array is also dropped — sending tools when the model is told to answer tempts it to call one instead.	~8,700 tokens (XML mode) / ~860 tokens (already-native mode, since the schema is already gone).
`_native_tools_active`	`args.native_tools=True` (the default for Qwen 3.6 35B-A3 and newer).	The XML `<tool_def>` blocks inside the prompt's `{tool_schemas}` slot — schemas arrive via the OpenAI-style `payload["tools"]` channel instead. The XML format scaffolding (parsing rules, parallel-call guidance, CDATA hint) is preserved so models that still emit the legacy `<tool_call>` shape are parsed correctly as a fallback.	~7,800 tokens.

The hoisted predicate runs before the dynamic-state block that also sets force_final_response=True, so it mirrors that line's next_action_id check directly to stay in sync. The canonical is_final_generation assignment further down the loop is unchanged — the schema-skip is a strictly upstream short-circuit. Verified end-to-end by tests/test_context_compaction.py (9 cases pinning XML-only path, native-only path, savings floor, planner-driven final-gen, and the cross-cutting native-suppression on final turns).

Self-play control hooks

handle_chat carries two small but load-bearing hooks that coordinate with the self-play subsystem:

Continuous-loop interrupt — at the top of every turn, if context.selfplay_loop_task is active, the incoming user message flips context.selfplay_loop_stop. The background loop checks the event at each cycle boundary and during cool-off, so the loop pauses cleanly before answering the user.
Perfect-It skip during simulation — the Perfect-It follow-up LLM call (which writes optimisation suggestions into SkillMemory) is gated on getattr(ctx.skill_memory, "is_read_only", False) is True. During self-play the isolated context's ReadOnlySkillMemory class carries that marker; the block is skipped entirely so the ~15 s follow-up LLM call and its misleading "Saved optimization strategy to playbook" log don't fire on no-op writes.

Checklist nudge & read-only tool exemption

The agent's meta-task compliance nudge ("you mentioned learning but didn't call learn_skill") is intentionally suppressed when the turn's tools include a read-only surface tool — list_lessons, recall, manage_skills. Otherwise a user query like "what have you learned today?" would loop for 5+ turns as the nudge pushes the agent to write a redundant skill it just deduplicates. The exemption keeps "surface a lesson" distinct from "persist a new one".

Validator crash detector

When the self-play solver gets a validator rejection, handle_chat's attempt-loop runs a circuit-breaker check for validator-frame crashes — tracebacks whose innermost frame is .validator.py and that don't mention solution.py. Classified as crashes: SyntaxError, IndentationError, ImportError, ModuleNotFoundError, NameError, ValueError, TypeError, KeyError, IndexError, AttributeError. Matched crashes abort the cycle after attempt 1 instead of burning the remaining attempts on an unwinnable challenge. See dream_cycle for the upstream gates that catch most of these earlier.

Cross-module dependencies

core.planning — TaskTree for hierarchical decomposition.
core.prompts — SYSTEM_PROMPT and per-budget templates.
core.bus — memory hydration.
core.context_manager — token-budget compression.
core.llm — upstream calls.
tools.registry — tool discovery + dispatch.
core.verifier & core.uncertainty — adjudication and metacognition.