What Ghost Agent Can Do

Practical capabilities grouped by what a user might ask for. Each entry shows an example prompt, what happens under the hood, and which subsystems it exercises.

At a glance

Research & synthesis

Multi-source web search, PDF & page ingestion, cross-referenced summaries.

Code & data analysis

Run Python / shell inside a Docker sandbox on files you upload.

Multi-step projects

Scaffold, implement, test, and iterate on deliverables over many turns.

Image generation

SDXL-backed visuals through a dedicated GPU swarm pool.

Vision & OCR

Describe screenshots, extract tables from PDFs, caption photos.

Long-term memory

Remembers preferences, facts, and hard-won procedures across sessions.

Scheduling & autonomy

Cron-style recurring tasks that run the agent against a stored prompt.

Database admin

Query and introspect Postgres instances with safety rails.

Voice I/O

Talk to the agent through a TTS/STT server on a companion box.

Research & synthesis

Ghost Agent issues web searches through DuckDuckGo (over Tor when --anonymous is on), fetches the top results with SSRF-guarded HTTP, and folds the content into a single synthesised answer. PDFs are parsed with PyMuPDF; image-heavy pages are routed to the vision pool for OCR.

> Find the three most-cited 2025 papers on mixture-of-experts routing
  and give me a one-paragraph summary of each, with DOIs.

Under the hood: web_searchfetch_url (retry + identity rotation on 401/403/503) → vision on any PDFs → vector memory for de-duplication → synthesis turn.

Tools: search, vision. See also: Memory hydration (RRF).

Code execution & data analysis

Every emitted snippet runs inside an ephemeral Alpine/Python container bind-mounted to $GHOST_HOME/sandbox. The host filesystem is never touched; network egress is optional and goes through Tor when enabled. The agent reads back stdout, stderr, exit codes, and any files it wrote.

> I uploaded sales_2025.csv. Compute monthly retention cohorts and
  plot them as a heatmap. Save the figure as retention.png.

Under the hood: file_system.readexecute (Python in sandbox, pandas + matplotlib) → file_system.write → returns the PNG path plus a prose reading of the chart.

Tools: execute, file_system. See also: Docker sandbox.

Multi-step projects

For anything that takes more than one turn, the planner builds a hierarchical TaskTree with postcondition gates. The agent picks up the plan across sessions via the project store, checkpointing progress after each gated step.

> Build a Flask app with JWT auth and SQLite persistence. Put it in
  /workspace/auth-app, add pytest coverage, and make sure the tests
  pass before you tell me you're done.

Under the hood: planning decomposes the goal → projects tool scaffolds state → coding-pool LLM writes files → execute runs pytest → verifier confirms each postcondition before advancing.

Tools: projects, execute, file_system. See also: Project store.

Image generation

Image requests are routed to a dedicated SDXL swarm pool (--image-gen-nodes). The image is saved into the sandbox and returned as a path; the agent can then re-read it through the vision pool for critique-and-iterate loops.

> Draft a landing-page hero banner: retro-futurist, teal-and-gold,
  1600×600, no text overlay. Iterate if the composition feels
  bottom-heavy.

Under the hood: image_gen → SDXL server → file_system.readvision critique → optional second generation pass.

Tools: image_gen, vision, swarm.

Vision & OCR

Drop a screenshot, a PDF page, or a photo into the conversation and the agent routes it through the visual pool for description, table extraction, or debugging help.

> Here's a screenshot of a failing Grafana panel — what's wrong
  with the query, and what would fix the missing-series gap?

Under the hood: file upload → sandbox storage → vision tool with image attachment → visual-pool LLM returns structured analysis.

Tools: vision.

Long-term memory

Six independent stores keep different kinds of knowledge: semantic (vector), factual (graph), user-specific (profile), procedural (skills), chronological (episodes), and short-term (journal + scratchpad). An adaptive threshold decides what is worth committing.

> Remember that I prefer ISO-8601 dates everywhere, that my primary
  Postgres is at 10.0.1.4, and that I'm working from Athens (UTC+3).

Under the hood: facts land in profile; every subsequent system prompt includes them automatically. Observations from conversation also drop into the journal and get consolidated during the dream cycle.

See also: Memory hydration, Memory bus.

Scheduling & autonomy

APScheduler is wired into the agent at startup. Any prompt can be parked on a cron expression and will run unattended against a fresh session.

> Every weekday at 09:00 Athens time, fetch my Slack DMs from the
  last 12 hours and post a three-bullet digest to the #me channel.

Under the hood: tasks tool persists the schedule → APScheduler fires → agent runs the stored prompt with its full tool surface → posts result back.

Tools: tasks, Slack bot.

Database administration

The postgres_admin tool wraps SQLAlchemy with a default DSN from $GHOST_DEFAULT_DB. Queries are read-only unless the user explicitly authorises a mutation.

> On the analytics DB, show me any table over 1 GB and the top three
  largest indexes for each.

Under the hood: database → SQL built from pg_class / pg_stat_user_indexes → returned as a formatted table.

Tools: database.

Voice I/O

When the voice server is reachable ($PI_VOICE_URL) the agent can answer spoken prompts and stream TTS audio back. Works great on a Raspberry Pi with a USB mic on the desk.

🎙  "Hey ghost, what's the weather in Athens and what's on my calendar?"

Under the hood: STT captures the prompt → normal agent turn → TTS streams the reply. All memory and tools are available as usual.

See also: Voice server.

Self-improvement

During idle time, the dream cycle drains the journal into long-term stores, extracts reusable tool sequences as new "skills", and ranks them by observed utility. Skills that repeatedly fail get auto-retired to acquired_skills/retired/.

> (internal) After the third time the agent parsed Slack HTML into
  markdown, it saves the sequence as the "slack_html_to_md" skill.
  Future runs call it as a single tool.

Under the hood: Skill acquisition pipeline → acquired_skills registry → registered as a first-class tool on the next turn.

What it doesn't do