tools / vision.py

Multi-modal analysis via the visual LLM pool. PDF rasterisation to JPEG; URL or file inputs.

Tool: vision_analysis

Auto-registered when the LLM client supports vision.

ActionDefault prompt
graph_analysisAnalyse this graph/chart. Extract key data points, trends, legends, and conclusions.
describe_pictureDescribe this image in high detail. Mention objects, text, people, colors, and layout.
extract_text_pictureExtract all text from this image exactly as written (OCR).
extract_text_pdfExtract all text and describe any diagrams from these document pages exactly as written.

PDF handling

PyMuPDF rasterises the first 10 pages at 2× zoom to JPEG; otherwise the host's vision LLM would balloon context. A single image file is base64-encoded directly.

Execution

Calls llm_client.chat_completion({"messages": [...]}, use_vision=True). Returns "VISION ANALYSIS RESULT:\n{response}".

Safety