core / llm.py

Multi-pool LLM orchestrator. Owns httpx async clients, round-robin scheduling, and per-node circuit breakers.

Pool topology

Six logical pools, each backed by zero-or-more (url, model) nodes plus the always-present foreground client.

Pool	Used for	Selector
foreground	Default chat completion	`chat_completion()` with no flags
swarm	Parallel inference / fan-out	`chat_completion(use_swarm=True)`
worker	Cheap classifier / verifier sub-tasks	`route(task, ...)`
visual	Multi-modal (image / PDF)	`chat_completion(use_vision=True)`
coding	Code-specialist generation	`chat_completion(use_coding=True)`
image_gen	SDXL image generation	`generate_image()`

The RoutingTask enum at line 71 advertises the labels worker pools can fulfil:

NodeCircuitBreaker(failure_threshold=3, cooldown_seconds=60.0) tracks per-node state:

Figure 3 — NodeCircuitBreaker state diagram.

One httpx.AsyncClient per node with:

timeout

1 200 s — long-context generations can run for many minutes.

limits

3 keep-alive, 15 total connections per node.

keep-alive expiry

30 s.

proxy

Tor SOCKS proxy used if tor_proxy set in constructor.

headers

Forwards X-Ghost-Key when calling other Ghost Agent instances.

Method	Purpose
`async chat_completion(payload, use_swarm, use_worker, use_vision, use_coding, timeout)`	Pool-aware dispatch. Falls back across nodes when a circuit is OPEN.
`async route(task, payload, max_tokens=128, temperature=0.0, fallback=None)`	Send a small classification / repair task to the worker pool with a low max_tokens to keep cost down.
`async generate_image(payload)`	POST to a node from the `image_gen` pool; returns base64 PNG.
`get_*_node()`	Round-robin selection helpers (worker / vision / coding / image_gen).
`async close()`	Closes all underlying httpx clients on lifespan shutdown.

Fully async; asyncio.Semaphore(3) caps concurrent background routing tasks.
Per-node httpx AsyncClient means a single failing node cannot starve the others.
The breaker is process-local — restart resets state.