FastAPI + OpenAI chat

Build a minimal FastAPI service that exposes POST /chat, hydrates context from iLAB Memory for the requesting user, calls gpt-4o-mini, and persists the exchange — all in a single Python file.

What you'll need

Python 3.11+

from __future__ import annotations + modern type hints used throughout.

ilab-memory

pip install ilab-memory — embedded library, no remote server.

openai

pip install openai — official Python SDK (>=1.30).

fastapi + uvicorn

pip install "fastapi[standard]" — ships uvicorn for fastapi dev.

Architecture

 ┌───────────┐   POST /chat   ┌────────────────┐
 │  client   │ ─────────────▶ │  FastAPI app   │
 └───────────┘                │  (chat_app.py) │
                              └──┬─────────┬───┘
                                 │         │
                hydrate context  │         │  save turn
                                 ▼         ▼
                         ┌────────────────────┐
                         │   ILabMemory       │
                         │   (./memory.db)    │
                         └────────────────────┘
                                 ▲
                                 │ system prompt + history
                                 │
                         ┌────────────────────┐
                         │   OpenAI gpt-4o-mini│
                         └────────────────────┘

The app owns the loop. iLAB Memory is invoked deterministically — the LLM never decides when to read or write memory.

Implementation

1. Bootstrap the app

Build a singleton ILabMemory at startup using from_path and the lifespan hook. The DB path comes from ILAB_MEMORY_DB_PATH so production deployments can mount a persistent volume.

from contextlib import asynccontextmanager
from fastapi import FastAPI
from ilab_memory import ILabMemory

DB_PATH = os.environ.get("ILAB_MEMORY_DB_PATH", "./memory.db")

@asynccontextmanager
async def lifespan(app: FastAPI):
    app.state.mem = ILabMemory.from_path(DB_PATH)
    try:
        yield
    finally:
        app.state.mem.close()

app = FastAPI(lifespan=lifespan)

2. Define the request schema

A ChatRequest carries the user_id (the tenant key — see Recipe 3) and the user message. Pydantic validates both sides for free.

from pydantic import BaseModel, Field

class ChatRequest(BaseModel):
    user_id: str = Field(..., min_length=1)
    message: str = Field(..., min_length=1)

3. Hydrate context from memory

mem_session_start is the hydration entry point. It reuses an active session, returns the last 5 session summaries, and the most recent observations scored with ContextScore.

session = mem.mem_session_start(user_id=req.user_id)
history_lines = [f"- ({m.type}) {m.title}" for m in session.memories]

4. Compose the system prompt

Inline the hydrated memory into the system prompt. Keep it short — the LLM sees the gist, not the full corpus.

system_prompt = (
    "You are a helpful assistant. Use prior context only if relevant.\n"
    f"User profile / past memory:\n{chr(10).join(history_lines) or '(none yet)'}"
)

5. Call OpenAI

Use the async client. The SDK reads OPENAI_API_KEY from the environment automatically.

from openai import AsyncOpenAI

client = AsyncOpenAI()

completion = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": req.message},
    ],
)
reply = completion.choices[0].message.content or ""

6. Persist the turn

Save the exchange as a discovery observation. The <private> tag strips secrets BEFORE the content is hashed and persisted.

mem.mem_save(
    user_id=req.user_id,
    type="discovery",
    title=f"Turn: {req.message[:60]}",
    content=f"User: {req.message}\nAssistant: {reply}",
    topic_key=f"chat/{req.user_id}/turn",
)

tip

Trade-off — topic_key per user vs per turn:

Same topic_key (e.g. chat/{user_id}/turn) → one rolling observation that updates in place. Cheaper storage, fewer rows, but you lose individual turn history.
No topic_key → one observation per turn. Full history, full search recall, but the table grows linearly with traffic.

Pick per-turn when you need timeline replay, per-user when you only care about the latest state.

7. Return the reply

Wrap the answer in a JSON response. The session_id is included so the client can correlate logs.

return {"session_id": session.session_id, "reply": reply}

note

mem_session_start and mem_save are synchronous (the underlying SQLite writes are blocking). FastAPI handles this fine inside an async endpoint for low traffic. For high throughput, run them in a run_in_threadpool to avoid blocking the event loop.

warning

Never log the raw req.user_id or req.message without redacting first. Memory itself strips <private> blocks, but logs do not.

Full file

Complete code (copy-paste ready)

from __future__ import annotations

import os
from contextlib import asynccontextmanager
from typing import AsyncIterator

from fastapi import FastAPI
from openai import AsyncOpenAI
from pydantic import BaseModel, Field

from ilab_memory import ILabMemory

DB_PATH = os.environ.get("ILAB_MEMORY_DB_PATH", "./memory.db")
MODEL = os.environ.get("OPENAI_MODEL", "gpt-4o-mini")


@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
    app.state.mem = ILabMemory.from_path(DB_PATH)
    app.state.openai = AsyncOpenAI()  # reads OPENAI_API_KEY from env
    try:
        yield
    finally:
        app.state.mem.close()


app = FastAPI(lifespan=lifespan, title="iLAB Memory Chat")


class ChatRequest(BaseModel):
    user_id: str = Field(..., min_length=1)
    message: str = Field(..., min_length=1)


class ChatResponse(BaseModel):
    session_id: str
    reply: str


@app.post("/chat", response_model=ChatResponse)
async def chat(req: ChatRequest) -> ChatResponse:
    mem: ILabMemory = app.state.mem
    client: AsyncOpenAI = app.state.openai

    # 1. Hydrate per-user memory
    session = mem.mem_session_start(user_id=req.user_id)
    history_lines = [f"- ({m.type}) {m.title}" for m in session.memories]

    # 2. Compose system prompt with prior context
    system_prompt = (
        "You are a helpful assistant. Use prior context only if relevant.\n"
        f"User profile / past memory:\n{chr(10).join(history_lines) or '(none yet)'}"
    )

    # 3. Call the LLM
    completion = await client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": req.message},
        ],
    )
    reply = completion.choices[0].message.content or ""

    # 4. Persist the turn (one observation per turn — full timeline)
    mem.mem_save(
        user_id=req.user_id,
        type="discovery",
        title=f"Turn: {req.message[:60]}",
        content=f"User: {req.message}\nAssistant: {reply}",
    )

    return ChatResponse(session_id=session.session_id, reply=reply)


# Run with: fastapi dev chat_app.py

Test it from another shell:

curl -X POST http://127.0.0.1:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"user_id":"alice","message":"Hi, my name is Alice and I love hiking."}'

The next call from the same user_id will see Hi, my name is Alice in memories[] automatically.

What's next?

Long-running agent

Manage context across 50+ turns with periodic summarization and type filters.

Multi-user isolation

Scope memory per tenant with stable user_id patterns.

Sessions

How mem_session_start reuses, auto-closes, and hydrates context.

Privacy

What happens to <private>...</private> before content reaches disk.

What you'll need​

Python 3.11+

ilab-memory

openai

fastapi + uvicorn

Architecture​

Implementation​

1. Bootstrap the app​

2. Define the request schema​

3. Hydrate context from memory​

4. Compose the system prompt​

5. Call OpenAI​

6. Persist the turn​

7. Return the reply​

Full file​

What's next?​