core / llm.py

Multi-pool LLM orchestrator. Owns httpx async clients, round-robin scheduling, and per-node circuit breakers.

Pool topology

Six logical pools, each backed by zero-or-more (url, model) nodes plus the always-present foreground client.

PoolUsed forSelector
foregroundDefault chat completionchat_completion() with no flags
swarmParallel inference / fan-outchat_completion(use_swarm=True)
workerCheap classifier / verifier sub-tasksroute(task, ...)
visualMulti-modal (image / PDF)chat_completion(use_vision=True)
codingCode-specialist generationchat_completion(use_coding=True)
image_genSDXL image generationgenerate_image()

Routing tasks

The RoutingTask enum at line 71 advertises the labels worker pools can fulfil:

Circuit breaker

NodeCircuitBreaker(failure_threshold=3, cooldown_seconds=60.0) tracks per-node state:

CLOSED healthy OPEN tripped HALF probing 3 failures cooldown 60 s success failure → re-OPEN

Figure 3 — NodeCircuitBreaker state diagram.

HTTP client

One httpx.AsyncClient per node with:

timeout
1 200 s — long-context generations can run for many minutes.
limits
3 keep-alive, 15 total connections per node.
keep-alive expiry
30 s.
proxy
Tor SOCKS proxy used if tor_proxy set in constructor.
headers
Forwards X-Ghost-Key when calling other Ghost Agent instances.

Public methods

MethodPurpose
async chat_completion(payload, use_swarm, use_worker, use_vision, use_coding, timeout)Pool-aware dispatch. Falls back across nodes when a circuit is OPEN.
async route(task, payload, max_tokens=128, temperature=0.0, fallback=None)Send a small classification / repair task to the worker pool with a low max_tokens to keep cost down.
async generate_image(payload)POST to a node from the image_gen pool; returns base64 PNG.
get_*_node()Round-robin selection helpers (worker / vision / coding / image_gen).
async close()Closes all underlying httpx clients on lifespan shutdown.

Concurrency model