Memory hydration with Reciprocal Rank Fusion

How a query becomes a sectioned markdown injection of vector + graph + skill + episodic results.

RRF formula

combined_score(doc) = Σ_ranker  weight_ranker / (k + rank_in_ranker + 1)
k = 60  (rrf_k default)

The constant k = 60 is the canonical RRF "tie-breaker"; a smaller value heavily favours the top result of each ranker, a larger one flattens the curve.

Intent weights

The bus first classifies the query as factual, procedural, or contextual using the worker pool (RoutingTask.CLASSIFY_INTENT). The weight matrix:

Intentgraphvectorskillepisodic
factual2.01.00.50.3
procedural0.51.02.01.5
contextual1.01.51.01.0

Section budgets

Final markdown is capped at max_chars (default 6000). Per-section share:

Each section truncates independently so a noisy vector store can't crowd out factual graph hits.

Step-by-step

  1. Decompose the user query into 2-3 sub-queries (LLM call or "and also" / "as well as" heuristic).
  2. Fan-out: asyncio.gather hits each available memory tier with each sub-query.
  3. Per-source ranking: each tier returns its own ordered list (cross-encoder re-rank for vector; spreading-activation BFS for graph; BM25-augmented vector for skills; trigger-similarity for episodes).
  4. Fuse: RRF scores each candidate across rankers; intent weights bias the sum.
  5. Format: produce ### Knowledge Graph, ### Vector Recall, ### Playbook, ### Episodes sections.
  6. Inject: prepended to the system prompt before the LLM call.

Dedup gate (write side)

publish_fact uses an LRU of 256 most-recent (event_type, fact_data) MD5 signatures to short-circuit replays. Without it, an LLM that repeats itself across turns would re-write the same triplet to the graph and accumulate weight unnecessarily.