tools / file_system.py

Unified file operations with binary sniffing, PDF chunking, downloads, and path-traversal guards.

Tool: file_system

One LLM-facing tool that dispatches to a sub-handler by the operation argument.

OperationHandlerBehaviour
readtool_read_fileBinary sniff first 8 KB; errors on binary; clamps to max_context · 3.5 · 0.5 chars.
read_chunkedtool_read_document_chunkedPyMuPDF for PDF pages; text-file byte chunking with overlap.
inspecttool_inspect_fileFirst N lines (default 10).
writetool_write_fileAuto-serialise JSON, strip markdown, auto-create parent dirs, fixture summary for data files.
replacetool_replace_textExact match → flexible (whitespace-insensitive) → nearest snippet; auto-promote to write if missing target.
searchtool_file_searchripgrep with 40 KB output cap (head 10 KB + tail 30 KB).
findtool_find_filesfind(1) with 100-result limit; filters hidden / node_modules / __pycache__.
downloadtool_download_filecurl_cffi (Chrome impersonation) or httpx; 50 MB cap; Tor + identity refresh on 401/403/503.
copy / rename / move / deleteshutil-backedRefuses overwrite on copy.
listtool_list_filesWalks sandbox; extracts AST signatures for .py files; 200-file limit.

Path-safety

_get_safe_path resolves all paths relative to sandbox_dir and rejects path traversal via Path.is_relative_to(). Leading / is stripped and treated as relative.

/workspace/ prefix is stripped (2026-04-24). The sandbox bind-mounts the host sandbox root at /workspace inside the container, so /workspace/foo.py in the container is the same file as foo.py at the host's sandbox root. When the LLM round-trips container-visible paths (prompts, shell commands, and docs all encourage /workspace/...) to host-facing tools like file_system, the tool now also strips a leading workspace/ segment. Without the strip, /workspace/skills/foo.py resolved to $SANDBOX_DIR/workspace/skills/foo.py — a phantom workspace/ subdir — and later execute calls using the same path got ENOENT because the real file lived at /workspace/workspace/skills/foo.py inside the container. Confirmed in the 2026-04-24 in_gr_news session, which burned ~60 s rediscovering the mismatch. The strip is exact-segment only: workspaces/ (plural) and workspace_backup/ are not touched, and traversal inside a /workspace/ path (/workspace/../escape) is still blocked.

Container-side path translation for search / find (2026-04-26)

tool_file_search (operation=search) and tool_find_files (operation=find) shell out to rg / find via sandbox_manager.execute, which runs commands inside the Docker container at workdir=/workspace. The host filesystem is otherwise opaque to the container. Until the 2026-04-26 fix, both tools resolved their path argument with _get_safe_path (which returns a HOST-absolute path like /Users/me/sandbox/webos/app.js) and passed that path straight into the container — which silently produced "no matches" because the container can't see /Users/....

The new helper _to_container_path(sandbox_dir, host_path) translates a host-absolute path to its container-visible /workspace/<rel> equivalent. tool_file_search applies it unconditionally; tool_find_files applies it when given sandbox_dir via the dispatcher (legacy callers without sandbox_dir still work but skip translation). A path that resolves outside the sandbox is rejected explicitly rather than silently searching the wrong tree.

Incident: a 2026-04-26 webOS session burned ~70 minutes because six consecutive rg lockScreen.style.display webos/app.js calls all returned empty. The agent (correctly) concluded the searches showed no match and (incorrectly) concluded the file edit hadn't been applied — when in fact the search was running against a path the container couldn't see. Coverage: tests/test_file_system_search_container_path.py.

Binary sniffing

_looks_like_binary: NUL byte → binary; otherwise count non-text control chars (everything < 0x20 except TAB/LF/CR/FF) — > 5 % → binary.

PDF handling