What Ghost Agent Can Do
Practical capabilities grouped by what a user might ask for. Each entry shows an example prompt, what happens under the hood, and which subsystems it exercises.
At a glance
Research & synthesis
Multi-source web search, PDF & page ingestion, cross-referenced summaries.
Code & data analysis
Run Python / shell inside a Docker sandbox on files you upload.
Multi-step projects
Scaffold, implement, test, and iterate on deliverables over many turns.
Image generation
SDXL-backed visuals through a dedicated GPU swarm pool.
Vision & OCR
Describe screenshots, extract tables from PDFs, caption photos.
Long-term memory
Remembers preferences, facts, and hard-won procedures across sessions.
Scheduling & autonomy
Cron-style recurring tasks that run the agent against a stored prompt.
Database admin
Query and introspect Postgres instances with safety rails.
Voice I/O
Talk to the agent through a TTS/STT server on a companion box.
Research & synthesis
Ghost Agent issues web searches through DuckDuckGo (over Tor when --anonymous is on), fetches the top results with SSRF-guarded HTTP, and folds the content into a single synthesised answer. PDFs are parsed with PyMuPDF; image-heavy pages are routed to the vision pool for OCR.
> Find the three most-cited 2025 papers on mixture-of-experts routing
and give me a one-paragraph summary of each, with DOIs.
Under the hood: web_search → fetch_url (retry + identity rotation on 401/403/503) → vision on any PDFs → vector memory for de-duplication → synthesis turn.
Tools: search, vision. See also: Memory hydration (RRF).
Code execution & data analysis
Every emitted snippet runs inside an ephemeral Alpine/Python container bind-mounted to $GHOST_HOME/sandbox. The host filesystem is never touched; network egress is optional and goes through Tor when enabled. The agent reads back stdout, stderr, exit codes, and any files it wrote.
> I uploaded sales_2025.csv. Compute monthly retention cohorts and
plot them as a heatmap. Save the figure as retention.png.
Under the hood: file_system.read → execute (Python in sandbox, pandas + matplotlib) → file_system.write → returns the PNG path plus a prose reading of the chart.
Tools: execute, file_system. See also: Docker sandbox.
Multi-step projects
For anything that takes more than one turn, the planner builds a hierarchical TaskTree with postcondition gates. The agent picks up the plan across sessions via the project store, checkpointing progress after each gated step.
> Build a Flask app with JWT auth and SQLite persistence. Put it in
/workspace/auth-app, add pytest coverage, and make sure the tests
pass before you tell me you're done.
Under the hood: planning decomposes the goal → projects tool scaffolds state → coding-pool LLM writes files → execute runs pytest → verifier confirms each postcondition before advancing.
Tools: projects, execute, file_system. See also: Project store.
Image generation
Image requests are routed to a dedicated SDXL swarm pool (--image-gen-nodes). The image is saved into the sandbox and returned as a path; the agent can then re-read it through the vision pool for critique-and-iterate loops.
> Draft a landing-page hero banner: retro-futurist, teal-and-gold,
1600×600, no text overlay. Iterate if the composition feels
bottom-heavy.
Under the hood: image_gen → SDXL server → file_system.read → vision critique → optional second generation pass.
Tools: image_gen, vision, swarm.
Vision & OCR
Drop a screenshot, a PDF page, or a photo into the conversation and the agent routes it through the visual pool for description, table extraction, or debugging help.
> Here's a screenshot of a failing Grafana panel — what's wrong
with the query, and what would fix the missing-series gap?
Under the hood: file upload → sandbox storage → vision tool with image attachment → visual-pool LLM returns structured analysis.
Tools: vision.
Long-term memory
Six independent stores keep different kinds of knowledge: semantic (vector), factual (graph), user-specific (profile), procedural (skills), chronological (episodes), and short-term (journal + scratchpad). An adaptive threshold decides what is worth committing.
> Remember that I prefer ISO-8601 dates everywhere, that my primary
Postgres is at 10.0.1.4, and that I'm working from Athens (UTC+3).
Under the hood: facts land in profile; every subsequent system prompt includes them automatically. Observations from conversation also drop into the journal and get consolidated during the dream cycle.
See also: Memory hydration, Memory bus.
Scheduling & autonomy
APScheduler is wired into the agent at startup. Any prompt can be parked on a cron expression and will run unattended against a fresh session.
> Every weekday at 09:00 Athens time, fetch my Slack DMs from the
last 12 hours and post a three-bullet digest to the #me channel.
Under the hood: tasks tool persists the schedule → APScheduler fires → agent runs the stored prompt with its full tool surface → posts result back.
Database administration
The postgres_admin tool wraps SQLAlchemy with a default DSN from $GHOST_DEFAULT_DB. Queries are read-only unless the user explicitly authorises a mutation.
> On the analytics DB, show me any table over 1 GB and the top three
largest indexes for each.
Under the hood: database → SQL built from pg_class / pg_stat_user_indexes → returned as a formatted table.
Tools: database.
Voice I/O
When the voice server is reachable ($PI_VOICE_URL) the agent can answer spoken prompts and stream TTS audio back. Works great on a Raspberry Pi with a USB mic on the desk.
🎙 "Hey ghost, what's the weather in Athens and what's on my calendar?"
Under the hood: STT captures the prompt → normal agent turn → TTS streams the reply. All memory and tools are available as usual.
See also: Voice server.
Self-improvement
During idle time, the dream cycle drains the journal into long-term stores, extracts reusable tool sequences as new "skills", and ranks them by observed utility. Skills that repeatedly fail get auto-retired to acquired_skills/retired/.
> (internal) After the third time the agent parsed Slack HTML into
markdown, it saves the sequence as the "slack_html_to_md" skill.
Future runs call it as a single tool.
Under the hood: Skill acquisition pipeline → acquired_skills registry → registered as a first-class tool on the next turn.
What it doesn't do
- Arbitrary host access. Every tool that touches the filesystem or spawns a process goes through the Docker sandbox. The host's
$HOMEis unreachable. - Unauthenticated mutations. Destructive Postgres and filesystem operations require explicit user confirmation in the turn.
- Training. Ghost Agent uses its upstream LLM as-is; it doesn't fine-tune weights. Skill acquisition is pure symbolic reuse.