Skip to content

Architecture

Model Lens is organized as a monorepo with two top-level directories: apps/ (CLI + dashboard) and packages/ (core system modules).

apps/
cli/ ← Unified `modellens` CLI (Click-based)
dashboard/ ← Astro + React dashboard
packages/
logging.py ← Structured logging (Rich console + file output)
events/ ← Event bus — decoupled observability events
core/ ← Benchmark framework + trace capture + workload evaluation
benchmarks/ ← 11 benchmark implementations
providers/ ← 6 provider adapters (all OpenAI-compatible /v1)
skills/ ← Extensible, lockfile-verified skill system
prompt_packs/ ← Versioned benchmark collections
PackageResponsibilityDepends on
eventsEvent bus — publish/subscribe for all observability events(self-contained)
coreBenchmark framework, trace capture, workload evaluationproviders (for APICallMetrics)
benchmarksIndividual benchmark implementationscore
providersProvider adapters + framework integrations(self-contained)
skillsExtensible skill system(self-contained)
prompt_packsBenchmark prompt collections(static data)

The event bus (packages/events/) is the central nervous system. It decouples data producers from data consumers using typed events:

Event Bus
┌───────────────────────┼───────────────────────┐
│ │ │
Provider calls Benchmark runs Tool execution
│ │ │
▼ ▼ ▼
TokenGenerated MetricEvent ToolCallEvent
CompletionEvent RunLifecycleEvent ErrorEvent
│ │ │
└───────────────────────┼───────────────────────┘
┌─────────────┼─────────────┐
│ │ │
Metrics Traces Dashboard
Engine Engine (SSE/WS)
EventSourceConsumers
TokenGeneratedEventOpenAICompatibleProvider streamingTraceCapture, Dashboard (SSE)
CompletionEventOpenAICompatibleProvider responseMetricsEngine, ResultsCollector
MetricEventBenchmarkSuite / any componentDashboard, ReplayEngine
ToolCallEventSkill runtimeAgenticEvaluator, TraceCapture
ErrorEventAny componentDashboard, Alerts
RunLifecycleEventCLI entry point / BenchmarkSuiteResultsCollector, Dashboard

Model Lens intentionally maintains two independent benchmark systems:

SystemFileConfigPurpose
General suiteapps/cli/benchmark.pyconfig.yaml (YAML)MMLU-Pro, GSM8K, HumanEval, SWE-Bench Lite, IF-Eval
DevBench v2apps/cli/bench_apple_silicon_v2.pyconfig.json (JSON, deprecated)TypeScript/NestJS/React, Apple Silicon optimized

Both systems are first-class and equally authoritative — they are not migration phases. They share scoring/evaluation modules but differ in execution pipeline and config format. Do not merge them.

All 6 providers implement the ProviderAdapter interface and use OpenAI-compatible /v1/chat/completions endpoints:

ProviderAdapter (ABC — packages/providers/base.py)
├── OllamaClient
├── OpenWebUIClient
├── JanClient
├── LlamaCppClient
├── VLLMClient
└── OpenAICompatibleClient
ProviderDefault URLAuto-detect probe
LM Studiohttp://localhost:1234/v1/v1/models
Ollamahttp://localhost:11434/v1/api/tags
llama.cpphttp://localhost:8080/v1/v1/models
vLLMhttp://localhost:8000/v1/v1/models
Open WebUIhttp://localhost:3000/api/v1/api/v1/models
Janhttp://localhost:1337/v1/v1/models

Auto-detection probes in order: LM Studio → Ollama → llama.cpp → vLLM → Open WebUI → Jan.

User / Dashboard
modellens.py (CLI entry point)
├──[workload]──→ Real-project workload evaluation
├──[devbench]──→ Apple Silicon DevBench v2
├──[general]───→ BenchmarkSuite (11 benchmarks)
└──[compare]──→ Both frameworks
Results + Traces → Event Bus → Dashboard (Astro + React) → Cloudflare Pages
  1. Events package is self-contained. No dependencies on core, providers, or benchmarks.
  2. Core never imports from benchmarks. Individual benchmarks import from core, not vice versa.
  3. Providers are self-contained. Each client depends only on base.py, not on core.
  4. CLI is the only entry point. The dashboard delegates to modellens.py via subprocess.
  5. Skills are lazy-loaded. Registered at startup, validated against modellens.lock.
  6. Prompt packs are static data. No code execution, just JSON/YAML prompt definitions.