core / context_manager.py

Progressive context compression. Avoids the binary "fits" / "emergency-prune" cliff by compressing in five graduated levels.

Levels

LevelUsed-token ratioEffectFull-fidelity turns
L0< 60 %No compression.all
L1< 75 %Summarize tool outputs (keep first & last 3 lines, error markers).last 6
L2< 85 %Collapse verbose assistant messages.last 4
L3< 95 %Compress user messages too.last 2
L4≥ 95 %Emergency prune — keep only system + last user + last tool output.1

API

MethodPurpose
__init__(max_tokens=65536, token_estimator=None)Inject estimator (defaults to ~4 chars per token).
compress_if_needed(messages, llm_summarizer=None) -> List[dict]Apply progressive compression if used / max > 60 %.
_estimate_tokens(messages)Rough sum across messages.
_emergency_prune(messages)L4 emergency strategy.
_summarize_tool_output(msg)L1 — keep error / first-3 / last-3 lines.

Configuration constants

FULL_FIDELITY_TURNS = {0: 999, 1: 6, 2: 4, 3: 2, 4: 1}

Concurrency

Synchronous; llm_summarizer can optionally be an async callable for L1 compression of tool outputs.