memory / skills.py — SkillMemory
Lessons learned from past mistakes. Vector-backed retrieval with BM25 re-rank; utility-driven retention; verified-first pinning.
Storage
- JSON at
memory_dir/skills_playbook.json. - Vector embeddings stored alongside in VectorMemory with
metadata.type = "skill". threading.RLockfor JSON reads/writes; lazy-init fallback for tests.
Lesson schema (v2)
{
"schema_version": 2,
"timestamp": "ISO",
"task": str,
"trigger": str,
"mistake": str,
"anti_pattern": str,
"solution": str,
"correct_pattern": str,
"code_example": str,
"domains": list[str],
"confidence": float (0-1),
"source_challenge_hash": str,
"verified": bool,
"verification_attempted": bool,
"retrievals": int,
"helpful_retrievals": int,
"last_retrieved_at": "ISO",
"source": str,
"frequency": int,
"graduated": bool
}
Retention & eviction
PLAYBOOK_MAX = 50. Trim policy:
- Newest lesson is always kept (head pin).
- Verified lessons pinned next.
- Remaining slots filled by unverified, ranked by utility.
Utility formula
utility = confidence·0.5 + hit_rate·0.8 + (0.3 if verified) + log(freq)·0.1 + stale_penalty
Retrieval (three-tier, picked by query + memory_system presence)
get_playbook_context(query, memory_system, distance_threshold=0.45, limit, record_retrievals) dispatches to one of three paths:
- Vector path —
memory_systemANDqueryboth present. Runs a Chroma query overtype="skill"docs, tightens byDEFAULT_RETRIEVAL_DISTANCE = 0.45, then re-ranks with a BM25-lite keyword-overlap bonus. Best path; used whenever the bus wires a vector tier. - BM25 fallback —
queryis present butmemory_systemisNone(vector store offline, init failed, or a bus is wired without a vector tier). Does pure token-overlap against each lesson'strigger. Returns""if no lesson shares tokens — never falls through to recency. - Recency fallback — no query supplied (system-prompt-injection usage). Returns the N most recent lessons regardless of content, under the
## RECENT LESSONSheader.
Before the fix, the recency fallback fired whenever the vector path was unavailable — including on a real query. A question like "what's the capital of France?" would surface an unrelated Python-syntax lesson just because it happened to be the most recent entry, polluting context with junk. The new three-tier dispatch preserves the recency path ONLY for the no-query case (the SYSTEM_PROMPT cold-inject). Covered by tests/test_skills_bm25_fallback.py.
When a lesson is surfaced, record_retrieval increments the counter and emits a debug log keyed by source + source_challenge_hash — so the "is this self-play lesson ever actually used?" question has data to answer.
Dedup
Vector similarity check (distance < 0.15) or exact task match merges frequency + solution instead of inserting a duplicate.
Listing lessons (user-facing surface)
list_lessons(scope, source, limit) returns lessons filtered by time window and source, most-recent first. Boundaries use local wall-clock:
scope="today"— since local midnight.scope="week"/"7d"— last 7 days.scope="all"— entire playbook.source="self_play"(etc.) — filter by provenance.
Surfaced to the LLM via the list_lessons tool (see tools / memory). Phrases like "what did you learn today?" / "what have you learned so far?" are routed to this tool via SYSTEM_PROMPT.
Public methods
| Method | Purpose |
|---|---|
build_lesson(task, trigger, anti_pattern, correct_pattern, domains, confidence, source_challenge_hash, verified, source) → dict | Factory. |
learn_lesson(task, mistake, solution, memory_system=None, **structured) | Write lesson with dedup, merge, trim. |
list_lessons(scope="all", source="", limit=20) → list | Time-window + source filter; local-time boundaries; most-recent first. |
record_retrieval(trigger) / record_helpful_retrieval(trigger) | Hit-rate accounting. record_retrieval emits a debug log with source_challenge_hash for observability. |
credit_recent_retrievals(window_seconds=300) | Bulk credit (idempotent per window). |
prune_low_utility(min_retrievals=5, max_drop_fraction=0.25) | Drop bottom-quartile. |
get_recent_failures(limit) → str | Format for LLM injection. |
find_by_trigger(trigger) / mark_verified / remove_by_trigger | Index helpers used by the verification-grounded lesson flow in dream.py. |