tools / vision.py
Multi-modal analysis via the visual LLM pool. PDF rasterisation to JPEG; URL or file inputs.
Tool: vision_analysis
Auto-registered when the LLM client supports vision.
| Action | Default prompt |
|---|---|
graph_analysis | Analyse this graph/chart. Extract key data points, trends, legends, and conclusions. |
describe_picture | Describe this image in high detail. Mention objects, text, people, colors, and layout. |
extract_text_picture | Extract all text from this image exactly as written (OCR). |
extract_text_pdf | Extract all text and describe any diagrams from these document pages exactly as written. |
PDF handling
PyMuPDF rasterises the first 10 pages at 2× zoom to JPEG; otherwise the host's vision LLM would balloon context. A single image file is base64-encoded directly.
Execution
Calls llm_client.chat_completion({"messages": [...]}, use_vision=True). Returns "VISION ANALYSIS RESULT:\n{response}".
Safety
- 50 MB host file cap.
- Path-traversal guard via
_get_safe_path.