Scoring
iLAB Memory ranks observations with two distinct scoring schemes: SearchScore for mem_search, ContextScore for the memories[] returned by mem_session_start. They share helper math but combine signals with different weights — and their values are not comparable.
Never compare a SearchScore value to a ContextScore value. Different formulas, different scales, different semantic meaning. The only invariant they share is the [0.0, 1.0] range. This is Braess #5 — see Architecture.
The two schemes
SearchScore
Used by mem_search. Combines BM25 relevance to your query, recency, and revision count. Higher = more relevant to the query you asked.
ContextScore
Used by mem_session_start.memories. Combines recency, revision count, and a per-type priority. Higher = more salient as default context to load.
SearchScore — relevance to a query
SearchScore = 0.60 * fts5_rank
+ 0.25 * recency_score(updated_at)
+ 0.15 * revision_score(revision_count)
| Signal | Weight | What it measures |
|---|---|---|
fts5_rank | 0.60 | BM25 from SQLite FTS5, pre-normalized to [0, 1] |
recency_score | 0.25 | exponential decay, 30-day half-life |
revision_score | 0.15 | linear in revision_count, capped at 10 |
The dominant signal is the query match. Recency is a secondary tiebreaker — a 6-month-old observation that nails the query still beats a brand-new observation that doesn't.
ContextScore — salience as default context
ContextScore = 0.50 * recency_score(updated_at)
+ 0.30 * revision_score(revision_count)
+ 0.20 * type_priority(type)
| Signal | Weight | What it measures |
|---|---|---|
recency_score | 0.50 | same exponential decay, 30-day half-life |
revision_score | 0.30 | linear in revision_count, capped at 10 |
type_priority | 0.20 | per-type weight (see below) |
There is no query here — the goal is "if I have to load N memories at session start, which N matter most for this user right now?" Recency dominates; type priority is a thumb on the scale for stable user facts.
Type priorities
| Type | Priority |
|---|---|
profile | 1.00 |
preference | 0.90 |
decision | 0.70 |
pattern | 0.60 |
discovery | 0.50 |
summary | 0.30 |
| any unknown type | 0.50 |
Profile and preference observations bubble to the top of memories[] even when slightly older. Discoveries and patterns rank lower — they show up most often through search.
Shared helpers
Both formulas call the same primitives. There is one definition of recency_score and one of revision_score in the codebase (scoring.py), and each composite assembles them with its own weights.
from ilab_memory.scoring import recency_score, revision_score
print(recency_score("2025-12-01T00:00:00+00:00")) # ~0.39 if "now" is 2026-04-20
print(revision_score(5)) # 0.5
print(revision_score(50)) # 1.0 (capped at REVISION_CAP=10)
This is Braess #3. Adding a new signal to one composite requires updating the other in the same PR. The shared helpers are the single source of truth.
Why two schemes?
Because they answer different questions.
mem_search: "Among everything I have for this user, what is closest to this query string?" — query relevance dominates.mem_session_start: "Among everything I have for this user, what should the LLM see by default before the user even speaks?" — recency and salience dominate.
A single scheme could not weight FTS rank both at 0.60 (when there is a query) and at 0.0 (when there is not). Splitting them keeps each formula honest and tunable.
Don't mix the scores
hits = mem.mem_search(user_id="alice", query="auth") # SearchScore
resp = mem.mem_session_start(user_id="alice") # ContextScore in resp.memories
# DO NOT do this:
all_obs = sorted(hits + resp.memories, key=lambda o: o.score, reverse=True)
# The sort is meaningless — the scores are on incompatible scales.
If you need a unified ranking across both, compute it yourself from the underlying observations (e.g., load full records via mem_get_observation and apply your own formula). The library deliberately refuses to fuse the two.
Next
- Architecture / Braess #5 — why this rule is enforced at the type level (
SearchScorevsContextScoreare distinct frozen Pydantic classes). - Observations — what the scored objects look like.