Memory hydration with Reciprocal Rank Fusion
How a query becomes a sectioned markdown injection of vector + graph + skill + episodic results.
RRF formula
combined_score(doc) = Σ_ranker weight_ranker / (k + rank_in_ranker + 1)
k = 60 (rrf_k default)
The constant k = 60 is the canonical RRF "tie-breaker"; a smaller value heavily favours the top result of each ranker, a larger one flattens the curve.
Intent weights
The bus first classifies the query as factual, procedural, or contextual using the worker pool (RoutingTask.CLASSIFY_INTENT). The weight matrix:
| Intent | graph | vector | skill | episodic |
|---|---|---|---|---|
| factual | 2.0 | 1.0 | 0.5 | 0.3 |
| procedural | 0.5 | 1.0 | 2.0 | 1.5 |
| contextual | 1.0 | 1.5 | 1.0 | 1.0 |
Section budgets
Final markdown is capped at max_chars (default 6000). Per-section share:
- graph — 25 %
- vector — 40 %
- skill — 20 %
- episodic — 15 %
Each section truncates independently so a noisy vector store can't crowd out factual graph hits.
Step-by-step
- Decompose the user query into 2-3 sub-queries (LLM call or "and also" / "as well as" heuristic).
- Fan-out:
asyncio.gatherhits each available memory tier with each sub-query. - Per-source ranking: each tier returns its own ordered list (cross-encoder re-rank for vector; spreading-activation BFS for graph; BM25-augmented vector for skills; trigger-similarity for episodes).
- Fuse: RRF scores each candidate across rankers; intent weights bias the sum.
- Format: produce
### Knowledge Graph,### Vector Recall,### Playbook,### Episodessections. - Inject: prepended to the system prompt before the LLM call.
Dedup gate (write side)
publish_fact uses an LRU of 256 most-recent (event_type, fact_data) MD5 signatures to short-circuit replays. Without it, an LLM that repeats itself across turns would re-write the same triplet to the graph and accumulate weight unnecessarily.