Skip to content

Trace Replay

Trace replay lets you record and replay model execution sessions with playback controls. This is the core observability feature of Model Lens.

Each trace records the full execution timeline:

Run
├── Trace
│ ├── Events ← Ordered timeline events
│ ├── Metrics ← Latency, tokens/sec, memory at each step
│ └── Artifacts ← Response text, logs, errors
TimestampEvent
00:00Prompt sent to model
00:10First token received (TTFT)
00:30Reasoning complete (if chain-of-thought)
00:50Response complete
  • TTFT — Time to first token (ms)
  • Tokens/sec — Generation throughput at each step
  • Memory — RAM usage snapshot (MB)
  • Latency — Cumulative execution time
ControlDescription
PlayRun the trace from start to finish
PauseFreeze at current event
Speed0.5×, 1×, 2×, 4× playback speed
StepAdvance one event at a time

The CLI can stream events live to the dashboard:

9090/events
python apps/cli/modellens.py run --sse-port 9090

Behind the scenes, EventBusSSEServer subscribes to the event bus and streams all events as Server-Sent Events.

Events are persisted to disk by EventBusReplayWriter:

from events.replay import EventBusReplayWriter
writer = EventBusReplayWriter(output_dir="results/replays")
writer.start() # subscribes to default_bus, persists events
# ... benchmarks run ...
writer.stop()

Replay files are saved as results/replays/<run_id>.json.

Compare two model traces side by side:

  • Token stream — How did each model generate text?
  • Latency diff — Where did one model spend more time?
  • Memory diff — Which model used more RAM?
  • Output diff — How did responses diverge?

Save execution state as a shareable URL:

{
"prompt": "...",
"model": "qwen3.5-9b-coder",
"provider": "lm-studio",
"metrics": {
"ttft_ms": 210,
"tokens_per_second": 72.3,
"total_tokens": 145
},
"response": "...",
"trace": { "events": [...] }
}