Skip to content

Roadmap

V1 — Benchmark Foundation ✅ Complete

LM Studio and Ollama providers
Unified modellens CLI (run, info, leaderboard, models)
11 benchmark implementations (MMLU-Pro, GSM8K, HumanEval, etc.)
DevBench v2 (TypeScript/NestJS/React, statistical rigor)
Astro + React dashboard
Prompt packs (React, NestJS, debugging, agentic)
Hardware detection (CPU, GPU, RAM, OS)
Cloudflare Pages deployment
Community leaderboard

V2 — Local Model Observability 🚧 In Progress

Event bus — typed, thread-safe, wired into provider + benchmark flow
SSE streaming — real-time dashboard event bridge
Trace capture — token-level execution timeline
Trace replay — playback controls (play, pause, step, speed)
Event replay writer — persists all events to disk
Workload evaluation — real projects: React, NestJS, Rust, Python
Side-by-side model comparison (token stream, latency diff, memory diff)
Snapshot system (save/share execution state via URL)

V3 — Developer Observability 🚧 In Progress

Provider expansion — all 6 providers implemented (Open WebUI, Jan, llama.cpp, vLLM)
Skill system — types, registry, lockfile validation, built-in skills
MCP bridge foundation
OpenAI-compatible provider layer
Regression detection between model versions
WASM sandbox for skills
MCP server mode

V4 — OpenTelemetry for Local AI 🔮 Future

Trace schema standardization
OpenTelemetry export
VS Code extension
IDE integrations
Distributed agent traces
Team collaboration

Provider support timeline

Phase	Providers	Status
Phase 1	LM Studio, Ollama	✅ Complete
Phase 2	Open WebUI, Jan, llama.cpp, vLLM	✅ Complete
Phase 3	LocalAI, KoboldCPP, Text Generation WebUI	🔮 Future