core / agent.py

The reasoning loop. Drives streaming chat completion, parses tool-call XML, dispatches tools, and re-injects results.

Purpose

GhostAgent is the orchestrator that the FastAPI route hands a chat payload to. It owns the turn loop: it asks the upstream LLM for a streaming completion, watches the stream for <tool_call> XML blocks, parses arguments, runs the tool against the registry, captures output, and folds the result back into the conversation as a tool message. Sampling temperature, top-p, and the size of the "thinking" budget are all derived dynamically from a query classifier.

Public surface (selected)

SymbolSignature / typePurpose
GhostAgent.__init__(context: GhostContext)Captures all subsystems via context: LLM client, sandbox, memory bus, planner, MCTS, verifier, uncertainty tracker, scratchpad, scheduler, project store.
GhostAgent.handle_chatasync (body: dict, background_tasks, request_id: str)Entry point invoked by /api/chat. Returns a StreamingResponse generator (when stream=true) or a JSON body. Owns the turn loop.
_classify_coding_task(query: str) -> "creative" | "precise" | "balanced"Keyword-based classifier (line 84) that picks the sampling profile.
classify_thinking_budget(query, has_coding_intent, is_meta_task, in_active_project) -> "tight" | "extended" | "selfplay"Selects which think-block budget prompt to inject (line 155).
get_sampling_params(is_tool_turn, query, is_coding) -> dictReturns temperature / top_p / top_k / presence_penalty appropriate for the situation (line 210).
_find_substantive_tool_for_verifier(tools_run) -> Optional[dict]Skips bookkeeping tools (scratchpad, manage_projects, …) when picking evidence to hand the verifier.

Sampling profiles

ProfileTemperaturetop_ptop_k / extrasWhen
CODING_SAMPLING_PARAMS0.60.95top_k=20"precise" coding turns; tool turns; SQL/regex/algorithm intents.
GENERAL_SAMPLING_PARAMS1.0presence_penalty=1.5Free-form / creative turns.

Thinking budgets

Set via core/prompts.py and selected per turn:

The _EXTENDED_THINK_KEYWORDS list (agent.py:103) decides which user queries trigger the extended budget.

Constants

MAX_THINKING_CHARS
32 000 (line 242)
MAX_THINKING_CHARS_EXTENDED
64 000 (line 243)
DEFAULT_TOOL_TURN_MAX_TOKENS
16 384 (line 262)
_BOOKKEEPING_TOOL_NAMES
{"manage_projects","scratchpad",…} excluded from the verifier evidence search.

Turn flow

handle_chat(body) — auth, request_id, ContextVar set MemoryBus.hydrate_context (RRF fan-out) classify_thinking_budget + get_sampling_params ContextManager.compress_if_needed (L0..L4 progressive compression) LLMClient.chat_completion(stream=True) — upstream + circuit breaker Stream parser: collect <tool_call> XML Tool dispatch via registry (sandboxed) tool_failure.classify → retry / fatal / diagnostic Verifier (worker pool) — confirm/refute substantive tool result → next iteration or final SSE chunk

Figure 2 — Per-turn dispatch within GhostAgent.handle_chat.

Concurrency

Single-task asyncio. The streaming generator is consumed inside the FastAPI request task; tool execution is awaited inline (the tool itself may use threads / subprocess). The agent does not spawn parallel turns — concurrency lives in MemoryBus (parallel hydration) and LLMClient (parallel pool dispatch).

Context compaction (per-turn payload)

Before every upstream LLM call, handle_chat assembles a transient block (tool header + persona + memory hydration + dynamic state) that rides at the top of the last user message. Two predicates decide how aggressively to trim that block before serialisation:

PredicateTriggerWhat it suppressesPer-turn saving
_is_final_generation_for_schema force_final_response set, or planner returned required_tool=="none" / next_action_id=="none". Entire QWEN_TOOL_PROMPT + XML schema; replaced with a slim "Final-generation turn — DO NOT emit any <tool_call>" stanza that preserves the think-budget guidance. The native payload["tools"] array is also dropped — sending tools when the model is told to answer tempts it to call one instead. ~8,700 tokens (XML mode) / ~860 tokens (already-native mode, since the schema is already gone).
_native_tools_active args.native_tools=True (the default for Qwen 3.6 35B-A3 and newer). The XML <tool_def> blocks inside the prompt's {tool_schemas} slot — schemas arrive via the OpenAI-style payload["tools"] channel instead. The XML format scaffolding (parsing rules, parallel-call guidance, CDATA hint) is preserved so models that still emit the legacy <tool_call> shape are parsed correctly as a fallback. ~7,800 tokens.

The hoisted predicate runs before the dynamic-state block that also sets force_final_response=True, so it mirrors that line's next_action_id check directly to stay in sync. The canonical is_final_generation assignment further down the loop is unchanged — the schema-skip is a strictly upstream short-circuit. Verified end-to-end by tests/test_context_compaction.py (9 cases pinning XML-only path, native-only path, savings floor, planner-driven final-gen, and the cross-cutting native-suppression on final turns).

Self-play control hooks

handle_chat carries two small but load-bearing hooks that coordinate with the self-play subsystem:

Checklist nudge & read-only tool exemption

The agent's meta-task compliance nudge ("you mentioned learning but didn't call learn_skill") is intentionally suppressed when the turn's tools include a read-only surface toollist_lessons, recall, manage_skills. Otherwise a user query like "what have you learned today?" would loop for 5+ turns as the nudge pushes the agent to write a redundant skill it just deduplicates. The exemption keeps "surface a lesson" distinct from "persist a new one".

Validator crash detector

When the self-play solver gets a validator rejection, handle_chat's attempt-loop runs a circuit-breaker check for validator-frame crashes — tracebacks whose innermost frame is .validator.py and that don't mention solution.py. Classified as crashes: SyntaxError, IndentationError, ImportError, ModuleNotFoundError, NameError, ValueError, TypeError, KeyError, IndexError, AttributeError. Matched crashes abort the cycle after attempt 1 instead of burning the remaining attempts on an unwinnable challenge. See dream_cycle for the upstream gates that catch most of these earlier.

Cross-module dependencies