Skip to content

Model Lens

Observability-first platform for local AI
Terminal window
pip install -r requirements.txt
python apps/cli/modellens.py run --quick

Most local AI tooling focuses on a single layer — benchmarking, chat, model serving, or agents. Very few tools help you understand what a model is actually doing on your machine.

Model Lens gives you the full picture: execution traces, latency profiles, memory footprints, and real-world workload evaluations.

🔬 Trace & Replay

Capture token-level execution timelines. Replay with play, pause, step, and speed controls.

📊 Compare Models

Side-by-side latency, memory, and output diffs. Understand why one model outperforms another.

🏗️ Real Workloads

Test models on actual codebases — React, NestJS, Python, Rust — not just synthetic benchmarks.

🔌 6 Providers

LM Studio, Ollama, Open WebUI, Jan, llama.cpp, vLLM — all through OpenAI-compatible /v1 endpoints.

📡 Live Streaming

SSE bridge streams real-time events to the dashboard. Every token, every metric, live.

🛡️ CI-Enforced Quality

Ruff lint, ruff format, mypy type checking, and strict pytest on every push.

Provider calls → Event Bus (thread-safe)
├── SSE bridge → Dashboard (real-time)
├── Replay writer → Disk (historical)
└── Trace capture → Metrics engine

Every streaming token, every completion, every metric, and every benchmark lifecycle event flows through the central event bus. Consumers — the dashboard, replay engine, and metrics system — subscribe to what they need.