tools / browser.py

Native headless-browser automation via Playwright — Tor-aware, DNS-leak-safe, session-persistent.

Why a native tool

Before this tool, headless-browser usage was LLM-authored raw Playwright in execute(stateful=True). Four recurring footguns came with that path:

API shape drift: await async_playwright().start() vs async with, top-level await rules, cleanup order. Any slip crashed the Jupyter kernel.
DNS leaks: Chromium's SOCKS5 proxy tunnels traffic but can still resolve names via the container's /etc/resolv unless --host-resolver-rules is also set. Easy to miss.
WebRTC IP leaks: STUN/TURN can expose the host IP even over Tor.
Concurrency races: "store browser/page globally" broke when two plan branches both used the kernel.

The browser tool bakes correct-by-construction defaults for all four. Raw Playwright remains available in execute for advanced flows (response interception, custom wait conditions, file uploads).

Tool: `browser`

Six operations. The first five are atomic: each launches a fresh Chromium context and re-navigates via the .last_url sidecar. The sixth, interact, runs a sequence of sub-actions in a single context so transient DOM state survives between steps — required for multi-step SPA flows.

operation	required	returns
`navigate`	`url`	HTTP status, final URL, page title
`extract_text`	—	Rendered `innerText` of body (or a CSS selector), truncated at `max_chars`
`click`	`selector`	Post-click URL + title
`screenshot`	—	Writes PNG to `out_path` inside `/workspace`; returns `/api/download/` URL
`close`	—	Deletes the persistent profile so the next session starts fresh
`interact`	`actions`	Per-action results (ok/error), final URL, final title. One Chromium context across the whole list.

Atomic ops vs. `interact`

The atomic ops are perfect for single-page scrapes. They are not suitable for interactive multi-step flows, because every op's fresh context discards the JS-driven DOM mutations made by the previous op. A click that opens a modal is invisible to the next click, which sees a page just-loaded from its initial HTML. The 2026-04-23 WebOS eval hit this wall: the agent couldn't drive the Calculator because the Calculator window vanished between every click.

interact solves this by taking a whole sequence in one call:

browser(operation="interact",
        url="file:///workspace/webos/index.html",
        actions=[
          {"action": "click", "selector": "[data-app='calc']"},
          {"action": "wait_for_selector", "selector": "#calc-display"},
          {"action": "click", "selector": "#calc-btn-7"},
          {"action": "click", "selector": "#calc-btn-plus"},
          {"action": "click", "selector": "#calc-btn-3"},
          {"action": "click", "selector": "#calc-btn-eq"},
          {"action": "extract_text", "selector": "#calc-display"},
          {"action": "screenshot", "out_path": "calc_result.png"},
        ])

Sub-action types:

{"action":"goto","url":"...","wait_until":"load"} — navigate mid-sequence (e.g. follow a link into a subpage).
{"action":"click","selector":"...","wait_for_hidden":"#overlay","force":false} — see "Stability guard" below for the optional fields.
{"action":"dblclick","selector":"..."} — required for ondblclick-bound UIs (desktop-icon launchers etc.). A plain click never fires dblclick listeners; the 2026-04-24 webOS session produced 8 identical screenshots because the agent fired single clicks on .desktop-icon elements whose openApp() handler was wired to dblclick only. Accepts the same wait_for_hidden / force options as click.
{"action":"extract_text","selector":"...","max_chars":65536} — each extraction is preserved in the results so you can read multiple points in the flow.
{"action":"fill","selector":"input[name='email']","text":"a@b","wait_for_hidden":"#busy"} — real text input (atomic ops don't support typing). Same overlay-wait option as click.
{"action":"wait_for_selector","selector":"...","state":"visible|hidden|attached|detached","timeout_ms":5000} — state defaults to visible; use hidden to wait for an overlay to GO AWAY (the LLM-facing missing primitive that drove the 2026-04-26 incident).
{"action":"screenshot","out_path":"..."} — sandbox-safety + host→container path rewriting is applied per sub-action.
{"action":"sleep","ms":500} — when you need an explicit pause and there's no selector to wait on.

Stability guard: `wait_for_hidden` + `force` + `wait_for_selector(state)` (2026-04-26)

Bare page.click() auto-waits for its TARGET to be actionable, but knows nothing about an unrelated overlay still intercepting clicks. The 2026-04-26 webOS session lost ~70 minutes to this exact shape:

actions=[
  {"action": "click", "selector": "#unlock-btn"},   # JS sets #lock-screen.style.display='none'
  {"action": "click", "selector": "#start-btn"}     # ←  fails: lock screen still intercepts
]

Three new options give the LLM a clean way to express "wait for the overlay to leave, then click":

wait_for_hidden on click / dblclick / fill: a CSS selector for the overlay that must disappear before the action fires. Internally calls page.wait_for_selector(overlay, state="hidden") first; if the overlay does NOT go away in time, the action fails with wait_for_hidden(...) timed out before click(...) — a much clearer signal than Playwright's generic "element intercepts pointer events". An optional wait_for_hidden_ms (default min(5000, op.timeout_ms)) tunes the wait.
force on click / dblclick: skips Playwright's actionability check (visibility, stability, hit-test). Use sparingly — meant for cases where the LLM is certain the target is correct but Playwright is being conservative (mid-CSS-transition, or a target whose stability oracle is overzealous).
state on wait_for_selector: defaults to visible (Playwright's own default, the prior behaviour). New values hidden / attached / detached mirror Playwright's API. Bare wait_for_selector(sel) waited for the selector to APPEAR — useless for an element that's already in the DOM and just needs to fade out, which was the missing primitive that drove the incident.

Recommended pattern for a flow with a fading overlay:

actions=[
  {"action": "click", "selector": "#unlock-btn"},
  {"action": "wait_for_selector", "selector": "#lock-screen", "state": "hidden"},
  {"action": "dblclick", "selector": ".desktop-icon[data-app='calc']"},
  {"action": "screenshot", "out_path": "calc.png"},
]

Coverage: tests/test_browser_interact_stability_guard.py.

By default (stop_on_error=false) a failing sub-action is recorded and the sequence keeps going — so you get full visibility into which step is flaky. Pass stop_on_error=true to abort on first failure.

The subprocess wall-clock budget for interact scales with action count (timeout_ms × len(actions)), so a 20-step flow doesn't hit the single-op default cap.

Session persistence

Cookies, localStorage, and auth tokens survive across tool calls via Chromium's launch_persistent_context(user_data_dir=/workspace/.browser_profile). Each op opens → runs → closes cleanly; there is no long-lived subprocess to manage. Call operation="close" to wipe the profile (e.g. after logging out, or when switching user contexts).

Chained-op continuity (`.last_url` sidecar)

launch_persistent_context restores cookies/localStorage but NOT open pages — each op's browser process is a fresh launch. Without extra plumbing, a "step 2 continues where step 1 left off" call pattern would land on about:blank and every selector query would fail (surfaced by the 2026-04-23 eval, which then caused the LLM to burn a turn misattributing the error to a DOMContentLoaded race).

To let the LLM chain ops ergonomically, the runner writes the final URL of every successful navigation to <profile_dir>/.last_url. Any subsequent extract_text/click/screenshot without an explicit url reads the sidecar and re-navigates first. Typical usage:

browser(operation="navigate", url="https://example.com/login")
browser(operation="click", selector="#login-btn")      # no url — uses sidecar
browser(operation="extract_text", selector=".welcome") # no url — uses sidecar
browser(operation="screenshot", out_path="landed.png") # no url — uses sidecar
browser(operation="close")                             # wipes profile + sidecar

When neither an explicit url nor a sidecar URL is available, the runner errors out with extract_text needs a URL: pass `url=...` or call `operation="navigate"` first — no silent query against about:blank.

The sidecar lives inside the profile dir, so operation="close" (which rmtree's the whole directory) wipes it for free. Empty / None URLs are never written to the sidecar, so a failed op can't blow away the last-known-good URL.

Tor & DNS-leak hardening

When the sandbox was started with a tor_proxy, every op forwards it as Chromium's --proxy-server. The in-sandbox runner (_chromium_args) always adds, alongside the Docker-safe flags:

--host-resolver-rules="MAP * ~NOTFOUND , EXCLUDE localhost" — forces DNS through the SOCKS proxy; EXCLUDE localhost keeps the self-play fixture server + in-container services reachable.
--webrtc-ip-handling-policy=disable_non_proxied_udp — blocks WebRTC STUN/TURN egress that could expose the host IP.
--disable-features=WebRtcHideLocalIpsWithMdns — pairs with the above.

Chromium only accepts the socks5:// URL scheme. If the caller supplies socks5h:// (some repo helpers do, for httpx compatibility), the tool auto-rewrites — DNS-over-proxy is enforced via --host-resolver-rules regardless of URL scheme.

Runner protocol

A tiny script .browser_runner.py is written into /workspace on every call and invoked as python3 .browser_runner.py <op_json>. It emits exactly one sentinel line:

[BROWSER_OK] {...json payload...}
[BROWSER_ERR] <message>

The tool scans stdout for the sentinel so Chromium's unrelated warnings (GTK, ALSA, libpci) don't corrupt the result.

Return shape

Human-readable, deterministic per op, so downstream prompts can pattern-match:

--- BROWSER RESULT ---
STATUS: OK
OP: navigate
URL: https://example.com/
HTTP_STATUS: 200
TITLE: Example Domain

Failure modes surfaced

Unknown operation → STATUS: ERROR with valid-ops list.
Runner exits non-zero with a sentinel → the error is surfaced verbatim, plus a timeout/install hint.
Chromium crashes before printing a sentinel → raw tail of stderr is returned so the agent has something to debug with.

Interact abort semantics (navigation failures are terminal)

Under the default stop_on_error=False, a per-action failure (e.g. a click on a missing selector) is recorded and the loop continues — useful for "try all these selectors, tell me which matched" exploratory flows.

Navigation failures are the one exception: they always abort the sequence, regardless of stop_on_error. A page.goto(...) that raises (ERR_FILE_NOT_FOUND, ERR_CONNECTION_REFUSED, DNS failure, …) leaves Chromium on an error page; every subsequent click / fill / extract_text would just wait the full per-action timeout for elements that don't exist. Before the fix, a 54-action sequence whose first goto 404'd hung for ~108 minutes (54 × 120 s per-action timeout) before the outer subprocess watchdog killed it.

The fix: op_interact catches the goto exception, records aborted_sequence: True on the result entry, and breaks out of the loop immediately. The agent-facing output gains a banner:

⚠ SEQUENCE ABORTED: goto_failed. Remaining actions were NOT executed
  because the initial navigation failed — page.click/fill/extract on an
  error page would have just timed out one-by-one. Fix the URL and
  retry the whole interact call.

…so the next-turn planner reads the failure as "bad URL, retry the whole interact" rather than misinterpreting it as 53 mysterious per-action selector bugs. The return dict grows two fields: aborted: bool and abort_reason: "goto_failed" | "initial_goto_failed" | None. Non-navigation failures still honour the stop_on_error contract. Covered by tests/test_browser_interact_abort.py (9 cases including initial-goto failure, explicit-goto failure, implicit-nav failure, mid-sequence goto, stop_on_error preservation, and happy path).

execute — raw Playwright fallback path.
search — for static-HTML fetches; cheaper than a full browser render.
sandbox/docker — Chromium install + Tor daemon bootstrap.