Memory hydration with Reciprocal Rank Fusion

How a query becomes a sectioned markdown injection of vector + graph + skill + episodic results.

RRF formula

combined_score(doc) = Σ_ranker  weight_ranker / (k + rank_in_ranker + 1)
k = 60  (rrf_k default)

The constant k = 60 is the canonical RRF "tie-breaker"; a smaller value heavily favours the top result of each ranker, a larger one flattens the curve.

Intent weights

The bus first classifies the query as factual, procedural, or contextual using the worker pool (RoutingTask.CLASSIFY_INTENT). The weight matrix:

Intent	graph	vector	skill	episodic
factual	2.0	1.0	0.5	0.3
procedural	0.5	1.0	2.0	1.5
contextual	1.0	1.5	1.0	1.0

Section budgets

Final markdown is capped at max_chars (default 6000). Per-section share:

graph — 25 %
vector — 40 %
skill — 20 %
episodic — 15 %

Each section truncates independently so a noisy vector store can't crowd out factual graph hits.

Step-by-step

Decompose the user query into 2-3 sub-queries (LLM call or "and also" / "as well as" heuristic).
Fan-out: asyncio.gather hits each available memory tier with each sub-query.
Per-source ranking: each tier returns its own ordered list (cross-encoder re-rank for vector; spreading-activation BFS for graph; BM25-augmented vector for skills; trigger-similarity for episodes).
Fuse: RRF scores each candidate across rankers; intent weights bias the sum.
Format: produce ### Knowledge Graph, ### Vector Recall, ### Playbook, ### Episodes sections.
Inject: prepended to the system prompt before the LLM call.

Dedup gate (write side)

publish_fact uses an LRU of 256 most-recent (event_type, fact_data) MD5 signatures to short-circuit replays. Without it, an LLM that repeats itself across turns would re-write the same triplet to the graph and accumulate weight unnecessarily.