tools / browser.py

Native headless-browser automation via Playwright — Tor-aware, DNS-leak-safe, session-persistent.

Why a native tool

Before this tool, headless-browser usage was LLM-authored raw Playwright in execute(stateful=True). Four recurring footguns came with that path:

  1. API shape drift: await async_playwright().start() vs async with, top-level await rules, cleanup order. Any slip crashed the Jupyter kernel.
  2. DNS leaks: Chromium's SOCKS5 proxy tunnels traffic but can still resolve names via the container's /etc/resolv unless --host-resolver-rules is also set. Easy to miss.
  3. WebRTC IP leaks: STUN/TURN can expose the host IP even over Tor.
  4. Concurrency races: "store browser/page globally" broke when two plan branches both used the kernel.

The browser tool bakes correct-by-construction defaults for all four. Raw Playwright remains available in execute for advanced flows (response interception, custom wait conditions, file uploads).

Tool: browser

Six operations. The first five are atomic: each launches a fresh Chromium context and re-navigates via the .last_url sidecar. The sixth, interact, runs a sequence of sub-actions in a single context so transient DOM state survives between steps — required for multi-step SPA flows.

operationrequiredreturns
navigateurlHTTP status, final URL, page title
extract_textRendered innerText of body (or a CSS selector), truncated at max_chars
clickselectorPost-click URL + title
screenshotWrites PNG to out_path inside /workspace; returns /api/download/ URL
closeDeletes the persistent profile so the next session starts fresh
interactactionsPer-action results (ok/error), final URL, final title. One Chromium context across the whole list.

Atomic ops vs. interact

The atomic ops are perfect for single-page scrapes. They are not suitable for interactive multi-step flows, because every op's fresh context discards the JS-driven DOM mutations made by the previous op. A click that opens a modal is invisible to the next click, which sees a page just-loaded from its initial HTML. The 2026-04-23 WebOS eval hit this wall: the agent couldn't drive the Calculator because the Calculator window vanished between every click.

interact solves this by taking a whole sequence in one call:

browser(operation="interact",
        url="file:///workspace/webos/index.html",
        actions=[
          {"action": "click", "selector": "[data-app='calc']"},
          {"action": "wait_for_selector", "selector": "#calc-display"},
          {"action": "click", "selector": "#calc-btn-7"},
          {"action": "click", "selector": "#calc-btn-plus"},
          {"action": "click", "selector": "#calc-btn-3"},
          {"action": "click", "selector": "#calc-btn-eq"},
          {"action": "extract_text", "selector": "#calc-display"},
          {"action": "screenshot", "out_path": "calc_result.png"},
        ])

Sub-action types:

Stability guard: wait_for_hidden + force + wait_for_selector(state) (2026-04-26)

Bare page.click() auto-waits for its TARGET to be actionable, but knows nothing about an unrelated overlay still intercepting clicks. The 2026-04-26 webOS session lost ~70 minutes to this exact shape:

actions=[
  {"action": "click", "selector": "#unlock-btn"},   # JS sets #lock-screen.style.display='none'
  {"action": "click", "selector": "#start-btn"}     # ←  fails: lock screen still intercepts
]

Three new options give the LLM a clean way to express "wait for the overlay to leave, then click":

Recommended pattern for a flow with a fading overlay:

actions=[
  {"action": "click", "selector": "#unlock-btn"},
  {"action": "wait_for_selector", "selector": "#lock-screen", "state": "hidden"},
  {"action": "dblclick", "selector": ".desktop-icon[data-app='calc']"},
  {"action": "screenshot", "out_path": "calc.png"},
]

Coverage: tests/test_browser_interact_stability_guard.py.

By default (stop_on_error=false) a failing sub-action is recorded and the sequence keeps going — so you get full visibility into which step is flaky. Pass stop_on_error=true to abort on first failure.

The subprocess wall-clock budget for interact scales with action count (timeout_ms × len(actions)), so a 20-step flow doesn't hit the single-op default cap.

Session persistence

Cookies, localStorage, and auth tokens survive across tool calls via Chromium's launch_persistent_context(user_data_dir=/workspace/.browser_profile). Each op opens → runs → closes cleanly; there is no long-lived subprocess to manage. Call operation="close" to wipe the profile (e.g. after logging out, or when switching user contexts).

Chained-op continuity (.last_url sidecar)

launch_persistent_context restores cookies/localStorage but NOT open pages — each op's browser process is a fresh launch. Without extra plumbing, a "step 2 continues where step 1 left off" call pattern would land on about:blank and every selector query would fail (surfaced by the 2026-04-23 eval, which then caused the LLM to burn a turn misattributing the error to a DOMContentLoaded race).

To let the LLM chain ops ergonomically, the runner writes the final URL of every successful navigation to <profile_dir>/.last_url. Any subsequent extract_text/click/screenshot without an explicit url reads the sidecar and re-navigates first. Typical usage:

browser(operation="navigate", url="https://example.com/login")
browser(operation="click", selector="#login-btn")      # no url — uses sidecar
browser(operation="extract_text", selector=".welcome") # no url — uses sidecar
browser(operation="screenshot", out_path="landed.png") # no url — uses sidecar
browser(operation="close")                             # wipes profile + sidecar

When neither an explicit url nor a sidecar URL is available, the runner errors out with extract_text needs a URL: pass `url=...` or call `operation="navigate"` first — no silent query against about:blank.

The sidecar lives inside the profile dir, so operation="close" (which rmtree's the whole directory) wipes it for free. Empty / None URLs are never written to the sidecar, so a failed op can't blow away the last-known-good URL.

Tor & DNS-leak hardening

When the sandbox was started with a tor_proxy, every op forwards it as Chromium's --proxy-server. The in-sandbox runner (_chromium_args) always adds, alongside the Docker-safe flags:

Chromium only accepts the socks5:// URL scheme. If the caller supplies socks5h:// (some repo helpers do, for httpx compatibility), the tool auto-rewrites — DNS-over-proxy is enforced via --host-resolver-rules regardless of URL scheme.

Runner protocol

A tiny script .browser_runner.py is written into /workspace on every call and invoked as python3 .browser_runner.py <op_json>. It emits exactly one sentinel line:

[BROWSER_OK] {...json payload...}
[BROWSER_ERR] <message>

The tool scans stdout for the sentinel so Chromium's unrelated warnings (GTK, ALSA, libpci) don't corrupt the result.

Return shape

Human-readable, deterministic per op, so downstream prompts can pattern-match:

--- BROWSER RESULT ---
STATUS: OK
OP: navigate
URL: https://example.com/
HTTP_STATUS: 200
TITLE: Example Domain

Failure modes surfaced

Interact abort semantics (navigation failures are terminal)

Under the default stop_on_error=False, a per-action failure (e.g. a click on a missing selector) is recorded and the loop continues — useful for "try all these selectors, tell me which matched" exploratory flows.

Navigation failures are the one exception: they always abort the sequence, regardless of stop_on_error. A page.goto(...) that raises (ERR_FILE_NOT_FOUND, ERR_CONNECTION_REFUSED, DNS failure, …) leaves Chromium on an error page; every subsequent click / fill / extract_text would just wait the full per-action timeout for elements that don't exist. Before the fix, a 54-action sequence whose first goto 404'd hung for ~108 minutes (54 × 120 s per-action timeout) before the outer subprocess watchdog killed it.

The fix: op_interact catches the goto exception, records aborted_sequence: True on the result entry, and breaks out of the loop immediately. The agent-facing output gains a banner:

⚠ SEQUENCE ABORTED: goto_failed. Remaining actions were NOT executed
  because the initial navigation failed — page.click/fill/extract on an
  error page would have just timed out one-by-one. Fix the URL and
  retry the whole interact call.

…so the next-turn planner reads the failure as "bad URL, retry the whole interact" rather than misinterpreting it as 53 mysterious per-action selector bugs. The return dict grows two fields: aborted: bool and abort_reason: "goto_failed" | "initial_goto_failed" | None. Non-navigation failures still honour the stop_on_error contract. Covered by tests/test_browser_interact_abort.py (9 cases including initial-goto failure, explicit-goto failure, implicit-nav failure, mid-sequence goto, stop_on_error preservation, and happy path).

Related