tools / file_system.py
Unified file operations with binary sniffing, PDF chunking, downloads, and path-traversal guards.
Tool: file_system
One LLM-facing tool that dispatches to a sub-handler by the operation argument.
| Operation | Handler | Behaviour |
|---|---|---|
| read | tool_read_file | Binary sniff first 8 KB; errors on binary; clamps to max_context · 3.5 · 0.5 chars. |
| read_chunked | tool_read_document_chunked | PyMuPDF for PDF pages; text-file byte chunking with overlap. |
| inspect | tool_inspect_file | First N lines (default 10). |
| write | tool_write_file | Auto-serialise JSON, strip markdown, auto-create parent dirs, fixture summary for data files. |
| replace | tool_replace_text | Exact match → flexible (whitespace-insensitive) → nearest snippet; auto-promote to write if missing target. |
| search | tool_file_search | ripgrep with 40 KB output cap (head 10 KB + tail 30 KB). |
| find | tool_find_files | find(1) with 100-result limit; filters hidden / node_modules / __pycache__. |
| download | tool_download_file | curl_cffi (Chrome impersonation) or httpx; 50 MB cap; Tor + identity refresh on 401/403/503. |
| copy / rename / move / delete | shutil-backed | Refuses overwrite on copy. |
| list | tool_list_files | Walks sandbox; extracts AST signatures for .py files; 200-file limit. |
Path-safety
_get_safe_path resolves all paths relative to sandbox_dir and rejects path traversal via Path.is_relative_to(). Leading / is stripped and treated as relative.
/workspace/ prefix is stripped (2026-04-24). The sandbox bind-mounts the host sandbox root at /workspace inside the container, so /workspace/foo.py in the container is the same file as foo.py at the host's sandbox root. When the LLM round-trips container-visible paths (prompts, shell commands, and docs all encourage /workspace/...) to host-facing tools like file_system, the tool now also strips a leading workspace/ segment. Without the strip, /workspace/skills/foo.py resolved to $SANDBOX_DIR/workspace/skills/foo.py — a phantom workspace/ subdir — and later execute calls using the same path got ENOENT because the real file lived at /workspace/workspace/skills/foo.py inside the container. Confirmed in the 2026-04-24 in_gr_news session, which burned ~60 s rediscovering the mismatch. The strip is exact-segment only: workspaces/ (plural) and workspace_backup/ are not touched, and traversal inside a /workspace/ path (/workspace/../escape) is still blocked.
Container-side path translation for search / find (2026-04-26)
tool_file_search (operation=search) and tool_find_files (operation=find) shell out to rg / find via sandbox_manager.execute, which runs commands inside the Docker container at workdir=/workspace. The host filesystem is otherwise opaque to the container. Until the 2026-04-26 fix, both tools resolved their path argument with _get_safe_path (which returns a HOST-absolute path like /Users/me/sandbox/webos/app.js) and passed that path straight into the container — which silently produced "no matches" because the container can't see /Users/....
The new helper _to_container_path(sandbox_dir, host_path) translates a host-absolute path to its container-visible /workspace/<rel> equivalent. tool_file_search applies it unconditionally; tool_find_files applies it when given sandbox_dir via the dispatcher (legacy callers without sandbox_dir still work but skip translation). A path that resolves outside the sandbox is rejected explicitly rather than silently searching the wrong tree.
Incident: a 2026-04-26 webOS session burned ~70 minutes because six consecutive rg lockScreen.style.display webos/app.js calls all returned empty. The agent (correctly) concluded the searches showed no match and (incorrectly) concluded the file edit hadn't been applied — when in fact the search was running against a path the container couldn't see. Coverage: tests/test_file_system_search_container_path.py.
Binary sniffing
_looks_like_binary: NUL byte → binary; otherwise count non-text control chars (everything < 0x20 except TAB/LF/CR/FF) — > 5 % → binary.
PDF handling
readblocks PDFs with a hint to useread_chunked.read_chunkeduses PyMuPDF; max 1000 pages.- For ingestion via
knowledge_base, the PDF is chunked into the vector store.