memory / vector.py — VectorMemory

ChromaDB-backed semantic memory with adaptive spaced repetition, BM25 hybrid re-ranking, temporal decay, and document ingestion.

Storage

Record schema

FieldNotes
documentRaw text.
idMD5 of text — or for documents MD5(filename | chunk_idx | chunk_text).
metadata.timestampISO-Z timestamp.
metadata.typeauto · document · document_summary · episode · skill · manual · identity · summary · name_memory.
metadata.retrieval_count / last_accessedDrives spaced-repetition stretching.
metadata.sourceFilename for document chunks.
metadata.trigger / domains / verifiedSkill-only metadata.

Public methods

MethodBehaviour
add(text, meta)Insert. MD5 dedup; skip if < 5 chars.
smart_update(text, type_label)Vector dedup with distance < 0.50 → if existing similar memory, delete + re-add atomically under lock.
search(query, inject_identity)Hybrid search: optional dual-query (user query + identity probe) → wide candidate pool (30) → priority filter → BM25 re-rank → top-12 with section labels.
search_advanced(query, limit)Raw results + retrieval-count bump.
ingest_document(filename, chunks) -> (bool, str)Batch upsert, library dedup; 25-chunk batches under a single lock.
delete_document_by_name(filename)Drop all chunks of a source document.
delete_by_query(query)Vector lookup; delete best match if distance < 0.5.
get_library() -> list[str]List ingested filenames.

Scoring

For each candidate the bus computes:

combined = (priority_score · 10) + distance + time_penalty
time_penalty = age_days / (30 · log(retrieval_count + e))

Then a cross-encoder re-rank subtracts up to 0.3 from combined based on BM25 keyword overlap (normalised 0-1).

Priority scores

TypePriorityMax distance
name_memory−201.00
summary−150.75
episode−120.70
identity−100.65
document−51.25
manual00.55
auto10.55

Concurrency

threading.RLock guards Chroma mutations + library writes. Library JSON written via temp + os.replace for crash safety.