tools / vision.py

Multi-modal analysis via the visual LLM pool. PDF rasterisation to JPEG; URL or file inputs.

Tool: `vision_analysis`

Auto-registered when the LLM client supports vision.

Action	Default prompt
`graph_analysis`	Analyse this graph/chart. Extract key data points, trends, legends, and conclusions.
`describe_picture`	Describe this image in high detail. Mention objects, text, people, colors, and layout.
`extract_text_picture`	Extract all text from this image exactly as written (OCR).
`extract_text_pdf`	Extract all text and describe any diagrams from these document pages exactly as written.

PDF handling

PyMuPDF rasterises the first 10 pages at 2× zoom to JPEG; otherwise the host's vision LLM would balloon context. A single image file is base64-encoded directly.

Execution

Calls llm_client.chat_completion({"messages": [...]}, use_vision=True). Returns "VISION ANALYSIS RESULT:\n{response}".

Safety

50 MB host file cap.
Path-traversal guard via _get_safe_path.

tools / vision.py

Tool: vision_analysis

PDF handling

Execution

Safety

Tool: `vision_analysis`