Vision
Benchmarking is a feature. Observability is the product. Serving is infrastructure. Understanding is the workflow.
Why Model Lens exists
Section titled “Why Model Lens exists”Most LLM benchmarks give you a single score. They don’t tell you:
- Why a model failed on your prompt
- Why it’s slower on your hardware
- Why it consumes more memory on your workload
- What changed between model versions
- What actually happened during execution
Model Lens exists to answer these questions.
What we are building
Section titled “What we are building”We are not building another leaderboard, chat interface, model server, or agent framework. Excellent projects already exist for those use cases.
Model Lens focuses on understanding model behavior after execution. Think:
- Chrome DevTools for local AI
- Datadog for local AI
- OpenTelemetry for local AI
- GitHub Actions replay for local AI
Core principles
Section titled “Core principles”- Observability over scores — A single number is useless. Traces, metrics, and replays are useful.
- Real hardware, real workloads — Benchmarks should reflect how developers actually use models on their machines.
- Local-first — No cloud dependency. Everything runs on your hardware.
- Extensible by design — Prompt packs, skills, and providers should be community-extendable.
Product pillars
Section titled “Product pillars”- Observability — Capture traces, metrics, and execution details
- Replay — Record and replay model execution sessions with playback controls
- Workload Evaluation — Test models on real-world projects, not just benchmarks
- Benchmarking — Run standardized benchmarks against local models
- Community Packs — Shareable, versioned prompt collections
Visual identity
Section titled “Visual identity”Brand: Model Lens
Theme: Precision optics — clinical, reliable, utilitarian.
Concepts: focus, zoom, exposure, snapshots, replay, timelines
Ecosystem positioning
Section titled “Ecosystem positioning”| Layer | Tools |
|---|---|
| Model Serving | Ollama, llama.cpp, vLLM, LM Studio |
| User Interfaces | Open WebUI, Jan, LibreChat |
| Benchmarking | OpenBench, lm-evaluation-harness |
| Observability | Model Lens |
What Model Lens will never be
Section titled “What Model Lens will never be”- A cloud-hosted SaaS platform
- A general-purpose AI agent
- A model training or fine-tuning tool
- A replacement for academic benchmarks (MMLU, HumanEval) — we integrate them, we don’t compete
North Star
Section titled “North Star”A developer should be able to ask:
“Why is Qwen better than Gemma for my codebase?”
And Model Lens should provide benchmark evidence, execution traces, replay sessions, latency metrics, memory metrics, and workload comparisons — instead of a single score.