core / context_manager.py
Progressive context compression. Avoids the binary "fits" / "emergency-prune" cliff by compressing in five graduated levels.
Levels
| Level | Used-token ratio | Effect | Full-fidelity turns |
|---|---|---|---|
| L0 | < 60 % | No compression. | all |
| L1 | < 75 % | Summarize tool outputs (keep first & last 3 lines, error markers). | last 6 |
| L2 | < 85 % | Collapse verbose assistant messages. | last 4 |
| L3 | < 95 % | Compress user messages too. | last 2 |
| L4 | ≥ 95 % | Emergency prune — keep only system + last user + last tool output. | 1 |
API
| Method | Purpose |
|---|---|
__init__(max_tokens=65536, token_estimator=None) | Inject estimator (defaults to ~4 chars per token). |
compress_if_needed(messages, llm_summarizer=None) -> List[dict] | Apply progressive compression if used / max > 60 %. |
_estimate_tokens(messages) | Rough sum across messages. |
_emergency_prune(messages) | L4 emergency strategy. |
_summarize_tool_output(msg) | L1 — keep error / first-3 / last-3 lines. |
Configuration constants
FULL_FIDELITY_TURNS = {0: 999, 1: 6, 2: 4, 3: 2, 4: 1}
Concurrency
Synchronous; llm_summarizer can optionally be an async callable for L1 compression of tool outputs.