CLI Reference
Every flag accepted by python -m src.ghost_agent.main, sourced from main.py:61-138.
Networking & identity
| Flag | Type | Default | Effect |
|---|---|---|---|
--host | str | 0.0.0.0 | uvicorn bind address. |
--port | int | 8000 | uvicorn listen port. |
--upstream-url | str | http://127.0.0.1:8080 | OpenAI-compatible LLM backend (llama.cpp / Ollama / vLLM). |
--model | str | $GHOST_MODEL or qwen-3.6-35b-a3 | Model identifier returned on Ollama compatibility routes (/api/show, /api/tags). |
--api-key | str | $GHOST_API_KEY or ghost-secret-123 | Value the agent expects in the X-Ghost-Key header. |
--default-db | str | $GHOST_DEFAULT_DB or local Postgres DSN | Default DSN for the postgres_admin tool. |
Logging & verbosity
| Flag | Type | Effect |
|---|---|---|
-d / --daemon | flag | Suppress stdout logging (file-only). |
--debug | flag | Set log level to DEBUG. |
-v / --verbose | flag | Disable per-line truncation; raises LOG_TRUNCATE_LIMIT from 60 chars to 1M. |
Memory behaviour
| Flag | Default | Effect |
|---|---|---|
--no-memory | off | Skip vector / graph / profile / episodic / adaptive-threshold / contradiction-log initialisation. Useful for regression tests. |
--smart-memory | 0.0 | Initial value for the AdaptiveThreshold recall gate. The store self-tunes from observations once recording begins (window=100, MIN_OBSERVATIONS=20). |
--max-context | 65536 | Maximum tokens before ContextManager escalates compression (L0..L4). |
Tool surface
| Flag | Default | Effect |
|---|---|---|
--native-tools / --no-native-tools | on | Whether to advertise OpenAI-style tools array on outbound LLM calls. When on, the equivalent XML <tool_def> schema is suppressed from the prompt to avoid double-shipping the same definitions (~7,800 token saving per turn). The XML format scaffolding stays so the parser still accepts the legacy <tool_call> shape as fallback. Off forces XML-only tool dispatch (no native channel; full schema in prompt). See Context compaction in core.agent. |
--anonymous | on | Route web search through Tor / DuckDuckGo (with identity rotation on 401/403/503). |
--deep-reason | off | Initialise MCTSReasoner (max_candidates=3, max_depth=2) and HypothesisTester. |
--perfect-it | off | Append a proactive optimisation pass after a session completes. |
Stage-1 self-improvement pipeline
Local-only trajectory logging, self-critique reflection, and complexity-routed dispatch. All three default to ON (opt-out shape) when --no-memory is not set; the pipeline is fully local — no external teacher, no hosted embedder. See self-improvement pipeline for the architecture.
| Flag | Type | Default | Effect |
|---|---|---|---|
--no-trajectories | flag | off | Disable the JSONL trajectory log at $GHOST_HOME/system/trajectories/. Also implicitly disables reflection (which reads from the log). |
--no-reflection | flag | off | Disable the reflection biological phase (2.5) even if trajectory logging is on. Trajectories still write to disk but the Reflector never fires. |
--router-model | path | unset | Path to a persisted ComplexityClassifier JSON. When unset, the dispatcher acts as a pass-through that always escalates to the full swarm pool list — never less capable, just never cheaper. Train a classifier via trajectories collected with --no-trajectories off. |
--router-confidence-threshold | float | 0.3 | Minimum router confidence required to route a request to the cheap path. Below this, the dispatcher escalates to the full swarm (fail-safe). |
--prm-model | path | unset | Path to a persisted PRM (Process Reward Model) JSON checkpoint. When set, the scorer loads on startup and plugs into the MCTS reasoner so plan candidates are scored in microseconds instead of paying a worker-LLM simulation per candidate. Unset → no-op scorer returning 0.5 for every candidate (call sites stay branch-free). See PRM algorithms doc. |
--prm-train-cooldown | int (seconds) | 10800 | Cooldown for the idle-time PRM retrain pass (biological phase 2.7). Default 3 hours. No effect when --prm-model is unset. |
--frontier-selfplay / --no-frontier-selfplay | flag | on | Biological-watchdog phase-3 self-play picks the next cluster by (PRM uncertainty × trajectory rarity) instead of only the brittle-pool score. Surfaces clusters the agent has barely tried — which the outcomes-only signal misses, because "never tried" looks the same as "solved instantly". Degrades gracefully when the PRM is untrained or trajectory store is empty (strict isinstance gate; transparent fallback to pick_seed). See core / frontier_selection. |
--frontier-uniform-sample-prob | float | 0.2 | Probability per self-play tick that frontier-aware selection is bypassed in favour of legacy pick_seed — sanity floor so a systematically-wrong PRM can't lock self-play onto one cluster forever (the PRM is itself learned from trajectories self-play produces). |
Helper scripts
Python entry points under scripts/:
| Script | Purpose |
|---|---|
scripts/eval_baseline.py freeze|compare | Run the offline eval suite (--suite {default,post_learning}) via a stub runner or HTTP against a running agent. Freeze the result as a baseline or diff a subsequent run. Flags: --runner {stub,http}, --base-url, --api-key, --model, --timeout N (default 300s — template tasks on a local Qwen-scale model commonly run 80–250s). |
scripts/run_gepa.py --signature <name> | Run DSPy / GEPA prompt optimisation on one of the allow-listed signatures (planning.decompose, tool_selection.pick, reflection.critique). Reads trajectories from $GHOST_HOME/trajectories, uses Ghost's own upstream as the optimiser LM (no external teacher), writes the tuned instruction JSON to $GHOST_HOME/system/optim/. |
scripts/build_sandbox_image.sh | Build ghost-agent-base:latest from sandbox/Dockerfile — bakes apt deps, Python stack, and Playwright Chromium (with --with-deps) into the image. Runs a Chromium smoke test at the end. One-shot per Ghost version; the runtime sandbox wrapper picks up the freshly-built image on next ensure_running. |
Swarm topology
Each value is a comma-separated list of url|model pairs. Pools are independent and routed by LLMClient.
| Flag | Used for | Selector |
|---|---|---|
--swarm-nodes | Parallel inference / planning fan-out | chat_completion(use_swarm=True) |
--worker-nodes | Cheap classifier / verifier sub-tasks | route(task=...) dispatch |
--visual-nodes | Multimodal vision (PDF + image) | chat_completion(use_vision=True) |
--coding-nodes | Code generation specialists | chat_completion(use_coding=True) |
--image-gen-nodes | SDXL image generation | generate_image() |
Utility modules
The src/ghost_agent/utils/ package centralises cross-cutting helpers:
| Module | Highlights |
|---|---|
helpers.py | request_new_tor_identity() rotates Tor circuits; helper_fetch_url_content() wraps curl_cffi/httpx with SSRF guards, 5 MB body cap, 20 s timeout, and 3 retry attempts; recursive_split_text + semantic_split_text chunk text for ingestion; get_utc_timestamp / parse_utc_timestamp for ISO timestamps. |
logging.py | setup_logging configures rotating file + stdout sinks; pretty_log() emits structured icons-and-tags log lines with per-request tagging via the request_id_context ContextVar; truncation defaults to 60 chars unless --verbose. |
sanitizer.py | extract_code_from_markdown, fix_python_syntax (AST-driven repair loop, max 20 retries), and sanitize_code chain that scrubs control characters and heals partial Python before exec. |
token_counter.py | load_tokenizer caches Qwen3 35B tokenizer locally with a 15 s download timeout fallback; estimate_tokens uses a bounded LRU cache; check_budget reports per-message token usage. |