Documentation

Ghost Agent

An autonomous FastAPI service that wraps an OpenAI-compatible LLM with multi-tier memory, Docker-isolated tool execution, swarm inference, and biological-rhythm self-play.

Qwen 3.6 35B-A3 Tor-only egress Python 3.10+ FastAPI Docker Reference

Runtime stance. Ghost Agent is designed and tuned around an uncensored Qwen 3.6 35B-A3 upstream model, and every outbound network request the agent issues is mandated through Tor. Anonymity is not an add-on — it is enforced at the HTTP layer for every tool that reaches the open internet, with automatic circuit rotation on refusal. See Anonymity & Tor routing for the mechanics.

Ghost Agent wraps an upstream OpenAI-compatible LLM endpoint (llama.cpp / Ollama / vLLM) with a full agentic stack: a hierarchical task planner, a six-tier memory subsystem backed by ChromaDB and SQLite, a Docker-API container sandbox for code execution, a swarm router for fanning work out across specialised LLM pools, and three separate user-facing interfaces (web UI, Slack bot, desktop "Clockwork Ghost").

This documentation set was reverse-engineered directly from the source under src/ghost_agent/ and interface/. Use the navigation on the left to drill into individual modules, or follow the links below for a guided tour.

Quick links

System Architecture

End-to-end diagram of how interfaces, API, core, memory and sandbox interact.

Request lifecycle

What happens when a user message hits /api/chat.

Install & Run

Environment variables, dependencies, container prerequisites.

CLI flags

Every python -m src.ghost_agent.main argument explained.

The reasoning loop

agent.py — streaming, tool dispatch, sampling profiles.

Vector memory

ChromaDB-backed semantic store with spaced-repetition.

Tool registry

How tools are advertised, dispatched, and validated.

Docker sandbox

Container lifecycle, mounts, Tor networking, resource limits.

Anonymity & Tor

Identity-rotation policy, what's routed through Tor, and the fetch pipeline.

Selfhood (unified self)

First-person autobiographical memory, self-state thread, recognition / wake-up prefix, periodic narrative consolidation. Five components that stitch episodic instances into one continuous self.

Source map

Path	Purpose	Doc section
`src/ghost_agent/main.py`	CLI entrypoint & FastAPI lifespan	CLI reference
`src/ghost_agent/core/`	Reasoning loop, planning, dream, MCTS, swarm router	Core
`src/ghost_agent/memory/`	Vector + graph + profile + skill + journal + episodic stores	Memory
`src/ghost_agent/tools/`	Tool registry & per-tool implementations	Tools
`src/ghost_agent/sandbox/`	Docker container manager	Sandbox
`src/ghost_agent/api/`	FastAPI routes	API
`src/ghost_agent/utils/`	Logging, sanitiser, token counter, helpers	Utilities
`interface/`	Web UI, Slack bot, voice/image servers, desktop client	Interfaces

Conceptual model

Ghost Agent is best understood as five concentric layers:

Interface layer — web/Slack/desktop clients that talk HTTP/SSE to the FastAPI core.
API layer — Ollama-compatible HTTP routes that authenticate (X-Ghost-Key) and stream agent responses.
Reasoning core — GhostAgent drives a streaming chat loop, parses tool calls from <tool_call> XML, dispatches tools, and re-injects results.
Memory + planning — six memory tiers fused via Reciprocal Rank Fusion, plus a hierarchical TaskTree with postcondition gating, MCTS lookahead, and uncertainty tracking.
Execution substrate — Docker containers for tool calls, swarm/worker LLM pools for parallel inference, and Tor for anonymous outbound traffic.

The Architecture page contains the full diagram.

Anonymity & Tor routing

Ghost Agent treats Tor as the default transport for outbound traffic — not as an optional flag. The proxy endpoint is declared once via the TOR_PROXY env var (canonical value: socks5h://127.0.0.1:9050) and is honoured by every HTTP-touching tool in the codebase. The --anonymous CLI switch is enabled by default and additionally routes web_search through DuckDuckGo, which keeps no query logs.

The fetch pipeline

Every outbound call funnels through utils/helpers.helper_fetch_url_content(), which wraps curl_cffi and httpx with:

SOCKS5h routing — DNS is resolved over Tor, not by the local resolver, so the operator's DNS server never sees the target hostname.
SSRF guards — private, link-local, and loopback ranges are refused before the SOCKS layer is touched. A tool call can't become a host-network scanner.
5 MB body cap and 20 s timeout — bounds the damage a hostile exit node or target server can do by serving slow or unbounded bodies.
3-retry budget — transient failures are retried with a fresh Tor identity between attempts.

Identity rotation (NEWNYM)

request_new_tor_identity() in utils/helpers.py is the agent's circuit-burn primitive. Mechanically, it speaks the Tor control protocol to the local daemon and issues a SIGNAL NEWNYM: Tor marks the current circuit dirty, and every subsequent stream opens through a brand-new entry-guard / middle / exit chain. The agent's previous exit node is no longer reachable from the next hop.

Rotation fires:

Automatically on HTTP 401 / 403 / 503 from web_search — a gateway or exit refusal usually clears on a different chain.
Between retries in check_weather (3 attempts with identity refresh) and any tool that elects to re-enter the pool rather than hammer the same circuit.
Between parallel reformulations in deep_research — the N parallel queries do not all emerge from the same exit.
On explicit agent intent — the agent can invoke rotation as a tool step when a task's anonymity posture demands it (e.g. before re-visiting a site that just saw traffic).

What is and isn't routed through Tor

Traffic	Transport	Notes
Web search (`ddgs` → DuckDuckGo)	Tor	Every request; identity rotated on refusal or rate-limit.
Page / document fetch (`fetch_url`)	Tor	SOCKS5h — DNS also leaves via Tor.
Weather & geolocation	Tor	Open-Meteo primary, `wttr.in` fallback; both honour `TOR_PROXY`.
Sandbox container egress	Tor	When the container has network access, it inherits the Tor SOCKS namespace.
Upstream LLM inference	Local	llama.cpp / vLLM runs on `127.0.0.1`. Never traverses a network boundary, Tor or otherwise.
Slack / voice / image-gen servers	LAN	Operator-owned companion services on the private network; intentionally direct so the agent can reach them behind Tor's default deny.

Sanity check & failure mode

The check_health tool probes the configured circuit on demand and reports whether check.torproject.org sees the request as Tor traffic, plus the current exit-node country. The probe is part of the default health readout — if the circuit is down or leaking to clearnet, the tool surfaces it loudly rather than silently degrading.

Fail-closed, not fail-open. If TOR_PROXY is unset or the Tor daemon is unreachable at startup, outbound tools refuse to fire rather than fall back to clearnet. This is intentional: a silently-cleartext agent is worse than a stalled one.