Documentation

Ghost Agent

An autonomous FastAPI service that wraps an OpenAI-compatible LLM with multi-tier memory, Docker-isolated tool execution, swarm inference, and biological-rhythm self-play.

Qwen 3.6 35B-A3 Tor-only egress Python 3.10+ FastAPI Docker Reference
Runtime stance. Ghost Agent is designed and tuned around an uncensored Qwen 3.6 35B-A3 upstream model, and every outbound network request the agent issues is mandated through Tor. Anonymity is not an add-on — it is enforced at the HTTP layer for every tool that reaches the open internet, with automatic circuit rotation on refusal. See Anonymity & Tor routing for the mechanics.

Ghost Agent wraps an upstream OpenAI-compatible LLM endpoint (llama.cpp / Ollama / vLLM) with a full agentic stack: a hierarchical task planner, a six-tier memory subsystem backed by ChromaDB and SQLite, a Docker-API container sandbox for code execution, a swarm router for fanning work out across specialised LLM pools, and three separate user-facing interfaces (web UI, Slack bot, desktop "Clockwork Ghost").

This documentation set was reverse-engineered directly from the source under src/ghost_agent/ and interface/. Use the navigation on the left to drill into individual modules, or follow the links below for a guided tour.

Quick links

System Architecture

End-to-end diagram of how interfaces, API, core, memory and sandbox interact.

Request lifecycle

What happens when a user message hits /api/chat.

Install & Run

Environment variables, dependencies, container prerequisites.

CLI flags

Every python -m src.ghost_agent.main argument explained.

The reasoning loop

agent.py — streaming, tool dispatch, sampling profiles.

Vector memory

ChromaDB-backed semantic store with spaced-repetition.

Tool registry

How tools are advertised, dispatched, and validated.

Docker sandbox

Container lifecycle, mounts, Tor networking, resource limits.

Anonymity & Tor

Identity-rotation policy, what's routed through Tor, and the fetch pipeline.

Selfhood (unified self)

First-person autobiographical memory, self-state thread, recognition / wake-up prefix, periodic narrative consolidation. Five components that stitch episodic instances into one continuous self.

Source map

PathPurposeDoc section
src/ghost_agent/main.pyCLI entrypoint & FastAPI lifespanCLI reference
src/ghost_agent/core/Reasoning loop, planning, dream, MCTS, swarm routerCore
src/ghost_agent/memory/Vector + graph + profile + skill + journal + episodic storesMemory
src/ghost_agent/tools/Tool registry & per-tool implementationsTools
src/ghost_agent/sandbox/Docker container managerSandbox
src/ghost_agent/api/FastAPI routesAPI
src/ghost_agent/utils/Logging, sanitiser, token counter, helpersUtilities
interface/Web UI, Slack bot, voice/image servers, desktop clientInterfaces

Conceptual model

Ghost Agent is best understood as five concentric layers:

  1. Interface layer — web/Slack/desktop clients that talk HTTP/SSE to the FastAPI core.
  2. API layer — Ollama-compatible HTTP routes that authenticate (X-Ghost-Key) and stream agent responses.
  3. Reasoning coreGhostAgent drives a streaming chat loop, parses tool calls from <tool_call> XML, dispatches tools, and re-injects results.
  4. Memory + planning — six memory tiers fused via Reciprocal Rank Fusion, plus a hierarchical TaskTree with postcondition gating, MCTS lookahead, and uncertainty tracking.
  5. Execution substrate — Docker containers for tool calls, swarm/worker LLM pools for parallel inference, and Tor for anonymous outbound traffic.

The Architecture page contains the full diagram.

Anonymity & Tor routing

Ghost Agent treats Tor as the default transport for outbound traffic — not as an optional flag. The proxy endpoint is declared once via the TOR_PROXY env var (canonical value: socks5h://127.0.0.1:9050) and is honoured by every HTTP-touching tool in the codebase. The --anonymous CLI switch is enabled by default and additionally routes web_search through DuckDuckGo, which keeps no query logs.

The fetch pipeline

Every outbound call funnels through utils/helpers.helper_fetch_url_content(), which wraps curl_cffi and httpx with:

Identity rotation (NEWNYM)

request_new_tor_identity() in utils/helpers.py is the agent's circuit-burn primitive. Mechanically, it speaks the Tor control protocol to the local daemon and issues a SIGNAL NEWNYM: Tor marks the current circuit dirty, and every subsequent stream opens through a brand-new entry-guard / middle / exit chain. The agent's previous exit node is no longer reachable from the next hop.

Rotation fires:

What is and isn't routed through Tor

TrafficTransportNotes
Web search (ddgs → DuckDuckGo)TorEvery request; identity rotated on refusal or rate-limit.
Page / document fetch (fetch_url)TorSOCKS5h — DNS also leaves via Tor.
Weather & geolocationTorOpen-Meteo primary, wttr.in fallback; both honour TOR_PROXY.
Sandbox container egressTorWhen the container has network access, it inherits the Tor SOCKS namespace.
Upstream LLM inferenceLocalllama.cpp / vLLM runs on 127.0.0.1. Never traverses a network boundary, Tor or otherwise.
Slack / voice / image-gen serversLANOperator-owned companion services on the private network; intentionally direct so the agent can reach them behind Tor's default deny.

Sanity check & failure mode

The check_health tool probes the configured circuit on demand and reports whether check.torproject.org sees the request as Tor traffic, plus the current exit-node country. The probe is part of the default health readout — if the circuit is down or leaking to clearnet, the tool surfaces it loudly rather than silently degrading.

Fail-closed, not fail-open. If TOR_PROXY is unset or the Tor daemon is unreachable at startup, outbound tools refuse to fire rather than fall back to clearnet. This is intentional: a silently-cleartext agent is worse than a stalled one.