FastAPI + OpenAI chat
Build a minimal FastAPI service that exposes POST /chat, hydrates context from iLAB Memory for the requesting user, calls gpt-4o-mini, and persists the exchange — all in a single Python file.
What you'll need
Python 3.11+
from __future__ import annotations + modern type hints used throughout.
ilab-memory
pip install ilab-memory — embedded library, no remote server.
openai
pip install openai — official Python SDK (>=1.30).
fastapi + uvicorn
pip install "fastapi[standard]" — ships uvicorn for fastapi dev.
Architecture
┌───────────┐ POST /chat ┌────────────────┐
│ client │ ─────────────▶ │ FastAPI app │
└───────────┘ │ (chat_app.py) │
└──┬─────────┬───┘
│ │
hydrate context │ │ save turn
▼ ▼
┌────────────────────┐
│ ILabMemory │
│ (./memory.db) │
└────────────────────┘
▲
│ system prompt + history
│
┌────────────────────┐
│ OpenAI gpt-4o-mini│
└────────────────────┘
The app owns the loop. iLAB Memory is invoked deterministically — the LLM never decides when to read or write memory.
Implementation
1. Bootstrap the app
Build a singleton ILabMemory at startup using from_path and the lifespan hook. The DB path comes from ILAB_MEMORY_DB_PATH so production deployments can mount a persistent volume.
from contextlib import asynccontextmanager
from fastapi import FastAPI
from ilab_memory import ILabMemory
DB_PATH = os.environ.get("ILAB_MEMORY_DB_PATH", "./memory.db")
@asynccontextmanager
async def lifespan(app: FastAPI):
app.state.mem = ILabMemory.from_path(DB_PATH)
try:
yield
finally:
app.state.mem.close()
app = FastAPI(lifespan=lifespan)
2. Define the request schema
A ChatRequest carries the user_id (the tenant key — see Recipe 3) and the user message. Pydantic validates both sides for free.
from pydantic import BaseModel, Field
class ChatRequest(BaseModel):
user_id: str = Field(..., min_length=1)
message: str = Field(..., min_length=1)
3. Hydrate context from memory
mem_session_start is the hydration entry point. It reuses an active session, returns the last 5 session summaries, and the most recent observations scored with ContextScore.
session = mem.mem_session_start(user_id=req.user_id)
history_lines = [f"- ({m.type}) {m.title}" for m in session.memories]
4. Compose the system prompt
Inline the hydrated memory into the system prompt. Keep it short — the LLM sees the gist, not the full corpus.
system_prompt = (
"You are a helpful assistant. Use prior context only if relevant.\n"
f"User profile / past memory:\n{chr(10).join(history_lines) or '(none yet)'}"
)
5. Call OpenAI
Use the async client. The SDK reads OPENAI_API_KEY from the environment automatically.
from openai import AsyncOpenAI
client = AsyncOpenAI()
completion = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": req.message},
],
)
reply = completion.choices[0].message.content or ""
6. Persist the turn
Save the exchange as a discovery observation. The <private> tag strips secrets BEFORE the content is hashed and persisted.
mem.mem_save(
user_id=req.user_id,
type="discovery",
title=f"Turn: {req.message[:60]}",
content=f"User: {req.message}\nAssistant: {reply}",
topic_key=f"chat/{req.user_id}/turn",
)
Trade-off — topic_key per user vs per turn:
- Same
topic_key(e.g.chat/{user_id}/turn) → one rolling observation that updates in place. Cheaper storage, fewer rows, but you lose individual turn history. - No
topic_key→ one observation per turn. Full history, full search recall, but the table grows linearly with traffic.
Pick per-turn when you need timeline replay, per-user when you only care about the latest state.
7. Return the reply
Wrap the answer in a JSON response. The session_id is included so the client can correlate logs.
return {"session_id": session.session_id, "reply": reply}
mem_session_start and mem_save are synchronous (the underlying SQLite writes are blocking). FastAPI handles this fine inside an async endpoint for low traffic. For high throughput, run them in a run_in_threadpool to avoid blocking the event loop.
Never log the raw req.user_id or req.message without redacting first. Memory itself strips <private> blocks, but logs do not.
Full file
Complete code (copy-paste ready)
from __future__ import annotations
import os
from contextlib import asynccontextmanager
from typing import AsyncIterator
from fastapi import FastAPI
from openai import AsyncOpenAI
from pydantic import BaseModel, Field
from ilab_memory import ILabMemory
DB_PATH = os.environ.get("ILAB_MEMORY_DB_PATH", "./memory.db")
MODEL = os.environ.get("OPENAI_MODEL", "gpt-4o-mini")
@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
app.state.mem = ILabMemory.from_path(DB_PATH)
app.state.openai = AsyncOpenAI() # reads OPENAI_API_KEY from env
try:
yield
finally:
app.state.mem.close()
app = FastAPI(lifespan=lifespan, title="iLAB Memory Chat")
class ChatRequest(BaseModel):
user_id: str = Field(..., min_length=1)
message: str = Field(..., min_length=1)
class ChatResponse(BaseModel):
session_id: str
reply: str
@app.post("/chat", response_model=ChatResponse)
async def chat(req: ChatRequest) -> ChatResponse:
mem: ILabMemory = app.state.mem
client: AsyncOpenAI = app.state.openai
# 1. Hydrate per-user memory
session = mem.mem_session_start(user_id=req.user_id)
history_lines = [f"- ({m.type}) {m.title}" for m in session.memories]
# 2. Compose system prompt with prior context
system_prompt = (
"You are a helpful assistant. Use prior context only if relevant.\n"
f"User profile / past memory:\n{chr(10).join(history_lines) or '(none yet)'}"
)
# 3. Call the LLM
completion = await client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": req.message},
],
)
reply = completion.choices[0].message.content or ""
# 4. Persist the turn (one observation per turn — full timeline)
mem.mem_save(
user_id=req.user_id,
type="discovery",
title=f"Turn: {req.message[:60]}",
content=f"User: {req.message}\nAssistant: {reply}",
)
return ChatResponse(session_id=session.session_id, reply=reply)
# Run with: fastapi dev chat_app.py
Test it from another shell:
curl -X POST http://127.0.0.1:8000/chat \
-H "Content-Type: application/json" \
-d '{"user_id":"alice","message":"Hi, my name is Alice and I love hiking."}'
The next call from the same user_id will see Hi, my name is Alice in memories[] automatically.
What's next?
Long-running agent
Manage context across 50+ turns with periodic summarization and type filters.
Multi-user isolation
Scope memory per tenant with stable user_id patterns.
Sessions
How mem_session_start reuses, auto-closes, and hydrates context.
Privacy
What happens to <private>...</private> before content reaches disk.