Ghost Agent
An autonomous FastAPI service that wraps an OpenAI-compatible LLM with multi-tier memory, Docker-isolated tool execution, swarm inference, and biological-rhythm self-play.
Ghost Agent wraps an upstream OpenAI-compatible LLM endpoint (llama.cpp / Ollama / vLLM) with a full agentic stack: a hierarchical task planner, a six-tier memory subsystem backed by ChromaDB and SQLite, a Docker-API container sandbox for code execution, a swarm router for fanning work out across specialised LLM pools, and three separate user-facing interfaces (web UI, Slack bot, desktop "Clockwork Ghost").
This documentation set was reverse-engineered directly from the source under src/ghost_agent/ and interface/. Use the navigation on the left to drill into individual modules, or follow the links below for a guided tour.
Quick links
System Architecture
End-to-end diagram of how interfaces, API, core, memory and sandbox interact.
Request lifecycle
What happens when a user message hits /api/chat.
Install & Run
Environment variables, dependencies, container prerequisites.
CLI flags
Every python -m src.ghost_agent.main argument explained.
The reasoning loop
agent.py — streaming, tool dispatch, sampling profiles.
Vector memory
ChromaDB-backed semantic store with spaced-repetition.
Tool registry
How tools are advertised, dispatched, and validated.
Docker sandbox
Container lifecycle, mounts, Tor networking, resource limits.
Anonymity & Tor
Identity-rotation policy, what's routed through Tor, and the fetch pipeline.
Selfhood (unified self)
First-person autobiographical memory, self-state thread, recognition / wake-up prefix, periodic narrative consolidation. Five components that stitch episodic instances into one continuous self.
Source map
| Path | Purpose | Doc section |
|---|---|---|
src/ghost_agent/main.py | CLI entrypoint & FastAPI lifespan | CLI reference |
src/ghost_agent/core/ | Reasoning loop, planning, dream, MCTS, swarm router | Core |
src/ghost_agent/memory/ | Vector + graph + profile + skill + journal + episodic stores | Memory |
src/ghost_agent/tools/ | Tool registry & per-tool implementations | Tools |
src/ghost_agent/sandbox/ | Docker container manager | Sandbox |
src/ghost_agent/api/ | FastAPI routes | API |
src/ghost_agent/utils/ | Logging, sanitiser, token counter, helpers | Utilities |
interface/ | Web UI, Slack bot, voice/image servers, desktop client | Interfaces |
Conceptual model
Ghost Agent is best understood as five concentric layers:
- Interface layer — web/Slack/desktop clients that talk HTTP/SSE to the FastAPI core.
- API layer — Ollama-compatible HTTP routes that authenticate (
X-Ghost-Key) and stream agent responses. - Reasoning core —
GhostAgentdrives a streaming chat loop, parses tool calls from<tool_call>XML, dispatches tools, and re-injects results. - Memory + planning — six memory tiers fused via Reciprocal Rank Fusion, plus a hierarchical
TaskTreewith postcondition gating, MCTS lookahead, and uncertainty tracking. - Execution substrate — Docker containers for tool calls, swarm/worker LLM pools for parallel inference, and Tor for anonymous outbound traffic.
The Architecture page contains the full diagram.
Anonymity & Tor routing
Ghost Agent treats Tor as the default transport for outbound traffic — not as an optional flag. The proxy endpoint is declared once via the TOR_PROXY env var (canonical value: socks5h://127.0.0.1:9050) and is honoured by every HTTP-touching tool in the codebase. The --anonymous CLI switch is enabled by default and additionally routes web_search through DuckDuckGo, which keeps no query logs.
The fetch pipeline
Every outbound call funnels through utils/helpers.helper_fetch_url_content(), which wraps curl_cffi and httpx with:
- SOCKS5h routing — DNS is resolved over Tor, not by the local resolver, so the operator's DNS server never sees the target hostname.
- SSRF guards — private, link-local, and loopback ranges are refused before the SOCKS layer is touched. A tool call can't become a host-network scanner.
- 5 MB body cap and 20 s timeout — bounds the damage a hostile exit node or target server can do by serving slow or unbounded bodies.
- 3-retry budget — transient failures are retried with a fresh Tor identity between attempts.
Identity rotation (NEWNYM)
request_new_tor_identity() in utils/helpers.py is the agent's circuit-burn primitive. Mechanically, it speaks the Tor control protocol to the local daemon and issues a SIGNAL NEWNYM: Tor marks the current circuit dirty, and every subsequent stream opens through a brand-new entry-guard / middle / exit chain. The agent's previous exit node is no longer reachable from the next hop.
Rotation fires:
- Automatically on HTTP 401 / 403 / 503 from
web_search— a gateway or exit refusal usually clears on a different chain. - Between retries in
check_weather(3 attempts with identity refresh) and any tool that elects to re-enter the pool rather than hammer the same circuit. - Between parallel reformulations in
deep_research— the N parallel queries do not all emerge from the same exit. - On explicit agent intent — the agent can invoke rotation as a tool step when a task's anonymity posture demands it (e.g. before re-visiting a site that just saw traffic).
What is and isn't routed through Tor
| Traffic | Transport | Notes |
|---|---|---|
Web search (ddgs → DuckDuckGo) | Tor | Every request; identity rotated on refusal or rate-limit. |
Page / document fetch (fetch_url) | Tor | SOCKS5h — DNS also leaves via Tor. |
| Weather & geolocation | Tor | Open-Meteo primary, wttr.in fallback; both honour TOR_PROXY. |
| Sandbox container egress | Tor | When the container has network access, it inherits the Tor SOCKS namespace. |
| Upstream LLM inference | Local | llama.cpp / vLLM runs on 127.0.0.1. Never traverses a network boundary, Tor or otherwise. |
| Slack / voice / image-gen servers | LAN | Operator-owned companion services on the private network; intentionally direct so the agent can reach them behind Tor's default deny. |
Sanity check & failure mode
The check_health tool probes the configured circuit on demand and reports whether check.torproject.org sees the request as Tor traffic, plus the current exit-node country. The probe is part of the default health readout — if the circuit is down or leaking to clearnet, the tool surfaces it loudly rather than silently degrading.
TOR_PROXY is unset or the Tor daemon is unreachable at startup, outbound tools refuse to fire rather than fall back to clearnet. This is intentional: a silently-cleartext agent is worse than a stalled one.