sandbox / docker.py — DockerSandbox

Long-lived Docker container that hosts every tool call. Workspace bind-mount, optional Tor, resource limits, output capping.

Container topology

Host $GHOST_SANDBOX_DIR user files · acquired skills · gen images DockerSandbox (host process) md5(workspace)[:8] suffix → unique container tor_proxy → start tor inside container Container python:3.11-slim-bookworm cmd: sleep infinity /workspace ← bind-mount apt: git, nodejs, postgres-client, tor, ripgrep pip: numpy, pandas, torch, sklearn, … bind

Figure 8 — Workspace mount and container provisioning.

Lifecycle

StepBehaviour
__init__ (line 13-54)Container name suffix = md5(workspace)[:8] for parallel-session isolation. Auto-detects Docker/OrbStack socket on macOS.
ensure_running (108-282)Liveness probe (mount sync test + echo OK). Removes stale containers. --network host on Linux, bridge on macOS/Windows. Memory cap from GHOST_SANDBOX_MEM (default 4 g). CPU quota from GHOST_SANDBOX_CPU_QUOTA (default 200000 µs ≈ 2 vCPU). Lazy provisioning: commits provisioned container as ghost-agent-base:latest for instant reboot.
execute(cmd, timeout=300, memory_limit=None) (284-347)Wraps with timeout -k 5s {timeout}s. Output cap 256 KB (head + tail; middle dropped). Returns (output_string, exit_code). Runs as host uid:gid on Linux, root on macOS.
close(remove=False) (349-391)Stop (fast resume) by default; remove=True deletes. Idempotent and exception-safe. The lifespan shutdown calls with remove=False.

Provisioned packages

Preferred: build ghost-agent-base:latest via the authoritative Dockerfile one-shot; the runtime wrapper then only verifies and reuses it.

scripts/build_sandbox_image.sh
# reads sandbox/Dockerfile, runs a Chromium smoke test at the end,
# ~5 min on a warm docker cache

Provision marker is versioned (/root/.supercharged.v2)

The runtime gate in ensure_running checks TWO things, not one:

  1. test -f /root/.supercharged.v2 — the version-bumped marker. Legacy .supercharged (v1) images — which may have been committed before --with-deps was in the bootstrap — are treated as un-provisioned, forcing a clean reinstall on next boot.
  2. _chromium_binary_present()find /root/.cache/ms-playwright -type f \( -name headless_shell -o -name chrome \) -print -quit. Defends against the silent-failure mode where a prior install exited 0 but the binary isn't actually on disk (network flake, mid-extract disk-full, etc.).

If the marker is present but the binary is missing, the gate logs ⚠ Provision marker present but Chromium binary missing. Reinstalling... and re-runs the full install. Post-install, _chromium_binary_present() is checked AGAIN before the marker is touched — fail-loud: the v2 marker is never set unless the binary actually verifies on disk, so a failed install cannot silently poison future boots.

Bump the marker to .v3 (in both sandbox/Dockerfile AND sandbox/docker.py) any time the install surface genuinely changes. The meta-test tests/test_sandbox_chromium_gate.py::test_v2_marker_gate_name_pinned asserts the two paths stay in sync.

Covered by tests/test_sandbox_chromium_gate.py (12 cases including binary-present/absent probes, legacy-marker handling, post-install verification refusing to mark a broken image, and Dockerfile-level assertions about --with-deps).

Resource caps

memory
configurable, default 4 g
CPU quota
200000 µs (2 vCPU) at 100000 µs period
output cap
256 KB per execute call
install time
~60 s on first boot (+ ~2 min Chromium on the first-ever boot), cached thereafter

Tor

If tor_proxy is set in the constructor, the container starts a tor daemon and outbound HTTP from inside the sandbox is routed through it. The native browser tool additionally forces DNS-over-SOCKS via Chromium's --host-resolver-rules="MAP * ~NOTFOUND , EXCLUDE localhost" and disables non-proxied WebRTC (--webrtc-ip-handling-policy=disable_non_proxied_udp), so the browser path cannot leak DNS or the host IP even if the LLM forgets to configure the proxy correctly.