Long-running agent with context window management

A real agent does not finish at turn 3. It runs for hours, accumulating decisions and discoveries. This recipe drives a Claude-Sonnet agent through a long session and shows the three knobs iLAB Memory gives you to keep the context window healthy.

What you'll need

Python 3.11+

Async stdlib for the agent loop.

ilab-memory

pip install ilab-memory

anthropic

pip install anthropic — official SDK with async client.

ANTHROPIC_API_KEY

Exported in the shell or your .env loader of choice.

Architecture

 ┌──────────────┐
 │ agent loop   │  N turns
 │ (50+)        │
 └──────┬───────┘
        │
        │  every turn:                every 20 turns:
        │  load top-K decisions/      mem_session_summary
        │  patterns + recent context  → "summary" observation
        │
        ▼
 ┌────────────────┐    ┌────────────────────────────┐
 │  ILabMemory    │ ─▶ │  Claude Sonnet (anthropic) │
 │  (./agent.db)  │    └────────────────────────────┘
 └────────────────┘

Three knobs:

mem_search with types=[...] — load only architectural memory for the prompt.
mem_session_summary every N turns — collapse history into a summary observation.
topic_key — dedupe repeated facts (no row explosion).

Implementation

1. Open ONE session for the whole run

Don't open/close per turn. A session represents the logical "user task," not a single LLM call. The same session is reused on every mem_session_start until the timeout elapses.

mem = ILabMemory.from_path("./agent.db")
session = mem.mem_session_start(user_id="agent-001")
print(session.is_new, session.session_id)

2. Load only high-signal context

Use mem_search with types=['decision', 'pattern'] to skip casual chit-chat (discovery) when composing the prompt. This keeps the system prompt small even after thousands of turns.

architectural = mem.mem_search(
    user_id="agent-001",
    query="recent decisions and patterns",
    limit=10,
    types=["decision", "pattern"],
)

note

The score returned here is SearchScore — combining FTS rank, recency, and revision. It is not comparable to ContextScore from mem_session_start.memories. See Scoring for the formulas.

3. Run the agent turn

Compose, call Claude, capture the output. Standard async pattern.

from anthropic import AsyncAnthropic

claude = AsyncAnthropic()

msg = await claude.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system=system_prompt,
    messages=[{"role": "user", "content": user_input}],
)
reply = "".join(b.text for b in msg.content if b.type == "text")

4. Save with topic_key for natural dedup

If the user keeps repeating "I prefer dark mode," topic_key="user/agent-001/preferences/theme" ensures the same row is updated, not duplicated. The outcome field tells you what happened.

result = mem.mem_save(
    user_id="agent-001",
    type="preference",
    title="Theme preference",
    content="User prefers dark mode.",
    topic_key="user/agent-001/preferences/theme",
)
print(result.outcome)  # 'created' first time, 'deduped' or 'updated' after

5. Summarize every N turns

Periodic summarization compresses long history into a summary observation that future sessions hydrate cheaply. Note the dedicated mem_session_summary API — mem_save rejects type='summary' (reserved).

if turn_idx % 20 == 0 and turn_idx > 0:
    mem.mem_session_summary(
        user_id="agent-001",
        summary=f"Last 20 turns: {brief_recap}",
    )

tip

mem_session_summary does not close the session. It just appends a summary observation and touches last_activity_at. The session keeps going.

6. Close the session when truly done

mem_session_end writes the final human-authored summary. The next mem_session_start for the same user_id will create a fresh session.

closed = mem.mem_session_end(
    user_id="agent-001",
    summary="Completed onboarding task: 50 turns, 12 decisions captured.",
)
print(closed.is_auto_generated)  # False — human summary

warning

If you forget to close, the next mem_session_start after the 24-hour timeout will auto-close the stale session with a synthetic summary (is_auto_generated=True). That's a fallback, not a feature — close manually when you can.

Full file

Complete code (copy-paste ready)

from __future__ import annotations

import asyncio
import os
from typing import Iterable

from anthropic import AsyncAnthropic
from ilab_memory import ILabMemory

DB_PATH = os.environ.get("ILAB_MEMORY_DB_PATH", "./agent.db")
USER_ID = "agent-001"
MODEL = "claude-sonnet-4-5"
SUMMARY_EVERY = 20
TOTAL_TURNS = 50


def compose_prompt(memories: Iterable, architectural: Iterable) -> str:
    recent = "\n".join(f"- ({m.type}) {m.title}" for m in memories) or "(none)"
    decisions = "\n".join(f"* {d.title}" for d in architectural) or "(none)"
    return (
        "You are a long-running planning agent. Be concise.\n"
        f"## Recent context\n{recent}\n\n"
        f"## Architectural decisions / patterns\n{decisions}"
    )


async def run_turn(
    claude: AsyncAnthropic,
    mem: ILabMemory,
    user_input: str,
) -> str:
    session = mem.mem_session_start(user_id=USER_ID)
    architectural = mem.mem_search(
        user_id=USER_ID,
        query="decisions patterns",
        limit=10,
        types=["decision", "pattern"],
    )
    system_prompt = compose_prompt(session.memories, architectural)

    msg = await claude.messages.create(
        model=MODEL,
        max_tokens=1024,
        system=system_prompt,
        messages=[{"role": "user", "content": user_input}],
    )
    reply = "".join(b.text for b in msg.content if b.type == "text")

    # Persist the turn — type='discovery', no topic_key (one row per turn)
    mem.mem_save(
        user_id=USER_ID,
        type="discovery",
        title=f"Turn: {user_input[:60]}",
        content=f"User: {user_input}\nAssistant: {reply}",
    )
    return reply


async def main() -> None:
    claude = AsyncAnthropic()
    with ILabMemory.from_path(DB_PATH) as mem:
        for turn_idx in range(TOTAL_TURNS):
            user_input = f"(simulated turn {turn_idx}) plan next step"
            reply = await run_turn(claude, mem, user_input)
            print(f"[{turn_idx:02d}] {reply[:80]}")

            # Periodic compression — keeps next session's hydration cheap
            if turn_idx % SUMMARY_EVERY == 0 and turn_idx > 0:
                mem.mem_session_summary(
                    user_id=USER_ID,
                    summary=f"Checkpoint at turn {turn_idx}: agent on track.",
                )

        mem.mem_session_end(
            user_id=USER_ID,
            summary=f"Completed run: {TOTAL_TURNS} turns.",
        )


if __name__ == "__main__":
    asyncio.run(main())

What's next?

Multi-user isolation

Same agent loop, many tenants — user_id is the isolation key.

Scoring

Why SearchScore and ContextScore are not comparable, and what each one optimizes for.

Topic keys

The dedup primitive — when to set one and when to skip it.

FastAPI + OpenAI

Same memory primitives wrapped in an HTTP service.

What you'll need​

Python 3.11+

ilab-memory

anthropic

ANTHROPIC_API_KEY

Architecture​

Implementation​

1. Open ONE session for the whole run​

2. Load only high-signal context​

3. Run the agent turn​

4. Save with topic_key for natural dedup​

5. Summarize every N turns​

6. Close the session when truly done​

Full file​

What's next?​