CLI Reference

Every flag accepted by python -m src.ghost_agent.main, sourced from main.py:61-138.

Networking & identity

Flag	Type	Default	Effect
`--host`	str	`0.0.0.0`	uvicorn bind address.
`--port`	int	`8000`	uvicorn listen port.
`--upstream-url`	str	`http://127.0.0.1:8080`	OpenAI-compatible LLM backend (llama.cpp / Ollama / vLLM).
`--model`	str	`$GHOST_MODEL` or `qwen-3.6-35b-a3`	Model identifier returned on Ollama compatibility routes (`/api/show`, `/api/tags`).
`--api-key`	str	`$GHOST_API_KEY` or `ghost-secret-123`	Value the agent expects in the `X-Ghost-Key` header.
`--default-db`	str	`$GHOST_DEFAULT_DB` or local Postgres DSN	Default DSN for the `postgres_admin` tool.

Logging & verbosity

Flag	Type	Effect
`-d / --daemon`	flag	Suppress stdout logging (file-only).
`--debug`	flag	Set log level to DEBUG.
`-v / --verbose`	flag	Disable per-line truncation; raises `LOG_TRUNCATE_LIMIT` from 60 chars to 1M.

Memory behaviour

Flag	Default	Effect
`--no-memory`	off	Skip vector / graph / profile / episodic / adaptive-threshold / contradiction-log initialisation. Useful for regression tests.
`--smart-memory`	`0.0`	Initial value for the `AdaptiveThreshold` recall gate. The store self-tunes from observations once recording begins (window=100, MIN_OBSERVATIONS=20).
`--max-context`	`65536`	Maximum tokens before `ContextManager` escalates compression (L0..L4).

Tool surface

Flag	Default	Effect
`--native-tools` / `--no-native-tools`	on	Whether to advertise OpenAI-style `tools` array on outbound LLM calls. When on, the equivalent XML `<tool_def>` schema is suppressed from the prompt to avoid double-shipping the same definitions (~7,800 token saving per turn). The XML format scaffolding stays so the parser still accepts the legacy `<tool_call>` shape as fallback. Off forces XML-only tool dispatch (no native channel; full schema in prompt). See Context compaction in `core.agent`.
`--anonymous`	on	Route web search through Tor / DuckDuckGo (with identity rotation on 401/403/503).
`--deep-reason`	off	Initialise `MCTSReasoner` (max_candidates=3, max_depth=2) and `HypothesisTester`.
`--perfect-it`	off	Append a proactive optimisation pass after a session completes.

Stage-1 self-improvement pipeline

Local-only trajectory logging, self-critique reflection, and complexity-routed dispatch. All three default to ON (opt-out shape) when --no-memory is not set; the pipeline is fully local — no external teacher, no hosted embedder. See self-improvement pipeline for the architecture.

Flag	Type	Default	Effect
`--no-trajectories`	flag	off	Disable the JSONL trajectory log at `$GHOST_HOME/system/trajectories/`. Also implicitly disables reflection (which reads from the log).
`--no-reflection`	flag	off	Disable the reflection biological phase (2.5) even if trajectory logging is on. Trajectories still write to disk but the Reflector never fires.
`--router-model`	path	unset	Path to a persisted `ComplexityClassifier` JSON. When unset, the dispatcher acts as a pass-through that always escalates to the full swarm pool list — never less capable, just never cheaper. Train a classifier via trajectories collected with `--no-trajectories` off.
`--router-confidence-threshold`	float	`0.3`	Minimum router confidence required to route a request to the cheap path. Below this, the dispatcher escalates to the full swarm (fail-safe).
`--prm-model`	path	unset	Path to a persisted PRM (Process Reward Model) JSON checkpoint. When set, the scorer loads on startup and plugs into the MCTS reasoner so plan candidates are scored in microseconds instead of paying a worker-LLM simulation per candidate. Unset → no-op scorer returning 0.5 for every candidate (call sites stay branch-free). See PRM algorithms doc.
`--prm-train-cooldown`	int (seconds)	`10800`	Cooldown for the idle-time PRM retrain pass (biological phase 2.7). Default 3 hours. No effect when `--prm-model` is unset.
`--frontier-selfplay` / `--no-frontier-selfplay`	flag	on	Biological-watchdog phase-3 self-play picks the next cluster by `(PRM uncertainty × trajectory rarity)` instead of only the brittle-pool score. Surfaces clusters the agent has barely tried — which the outcomes-only signal misses, because "never tried" looks the same as "solved instantly". Degrades gracefully when the PRM is untrained or trajectory store is empty (strict `isinstance` gate; transparent fallback to `pick_seed`). See core / frontier_selection.
`--frontier-uniform-sample-prob`	float	`0.2`	Probability per self-play tick that frontier-aware selection is bypassed in favour of legacy `pick_seed` — sanity floor so a systematically-wrong PRM can't lock self-play onto one cluster forever (the PRM is itself learned from trajectories self-play produces).

Helper scripts

Python entry points under scripts/:

Script	Purpose
`scripts/eval_baseline.py freeze\|compare`	Run the offline eval suite (`--suite {default,post_learning}`) via a stub runner or HTTP against a running agent. Freeze the result as a baseline or diff a subsequent run. Flags: `--runner {stub,http}`, `--base-url`, `--api-key`, `--model`, `--timeout N` (default 300s — template tasks on a local Qwen-scale model commonly run 80–250s).
`scripts/run_gepa.py --signature <name>`	Run DSPy / GEPA prompt optimisation on one of the allow-listed signatures (`planning.decompose`, `tool_selection.pick`, `reflection.critique`). Reads trajectories from `$GHOST_HOME/trajectories`, uses Ghost's own upstream as the optimiser LM (no external teacher), writes the tuned instruction JSON to `$GHOST_HOME/system/optim/`.
`scripts/build_sandbox_image.sh`	Build `ghost-agent-base:latest` from `sandbox/Dockerfile` — bakes apt deps, Python stack, and Playwright Chromium (with `--with-deps`) into the image. Runs a Chromium smoke test at the end. One-shot per Ghost version; the runtime sandbox wrapper picks up the freshly-built image on next `ensure_running`.

Swarm topology

Each value is a comma-separated list of url|model pairs. Pools are independent and routed by LLMClient.

Flag	Used for	Selector
`--swarm-nodes`	Parallel inference / planning fan-out	`chat_completion(use_swarm=True)`
`--worker-nodes`	Cheap classifier / verifier sub-tasks	`route(task=...)` dispatch
`--visual-nodes`	Multimodal vision (PDF + image)	`chat_completion(use_vision=True)`
`--coding-nodes`	Code generation specialists	`chat_completion(use_coding=True)`
`--image-gen-nodes`	SDXL image generation	`generate_image()`

Utility modules

The src/ghost_agent/utils/ package centralises cross-cutting helpers:

Module	Highlights
`helpers.py`	`request_new_tor_identity()` rotates Tor circuits; `helper_fetch_url_content()` wraps curl_cffi/httpx with SSRF guards, 5 MB body cap, 20 s timeout, and 3 retry attempts; `recursive_split_text` + `semantic_split_text` chunk text for ingestion; `get_utc_timestamp` / `parse_utc_timestamp` for ISO timestamps.
`logging.py`	`setup_logging` configures rotating file + stdout sinks; `pretty_log()` emits structured icons-and-tags log lines with per-request tagging via the `request_id_context` ContextVar; truncation defaults to 60 chars unless `--verbose`.
`sanitizer.py`	`extract_code_from_markdown`, `fix_python_syntax` (AST-driven repair loop, max 20 retries), and `sanitize_code` chain that scrubs control characters and heals partial Python before exec.
`token_counter.py`	`load_tokenizer` caches Qwen3 35B tokenizer locally with a 15 s download timeout fallback; `estimate_tokens` uses a bounded LRU cache; `check_budget` reports per-message token usage.