Providers
Model Lens supports 6 local LLM providers through a unified ProviderAdapter interface. All providers communicate via OpenAI-compatible /v1/chat/completions endpoints.
Supported providers
Section titled “Supported providers”| Provider | Status | Default URL | Min version |
|---|---|---|---|
| LM Studio | ✅ Stable | http://localhost:1234/v1 | 0.3.x+ |
| Ollama | ✅ Stable | http://localhost:11434/v1 | 0.1.28+ |
| Open WebUI | ✅ Stable | http://localhost:3000/api/v1 | 0.5.x+ |
| Jan | ✅ Stable | http://localhost:1337/v1 | 0.5.x+ |
| llama.cpp | ✅ Stable | http://localhost:8080/v1 | b4200+ |
| vLLM | ✅ Stable | http://localhost:8000/v1 | 0.6.x+ |
LM Studio
Section titled “LM Studio”- Download LM Studio
- Load a model (e.g., Qwen 3.5, Gemma 4)
- Go to Developer tab → enable Local API Server
- Start the server (default:
http://localhost:1234)
# Auto-detect LM Studio modelspython apps/cli/modellens.py run --quick
# Explicit modelpython apps/cli/modellens.py run --provider lm-studio --models qwen3.5-9b-coderPython API
Section titled “Python API”from providers.openai_compatible import OpenAICompatibleProvider
client = OpenAICompatibleProvider( base_url="http://localhost:1234/v1", api_key="lm-studio", model_name="qwen3.5-9b-coder", timeout=120, max_retries=3,)
response, metrics = client.chat_completion([ {"role": "user", "content": "Explain TypeScript generics."}])# metrics.ttft, metrics.tokens_per_second, etc.Health check
Section titled “Health check”curl http://localhost:1234/v1/modelsOllama
Section titled “Ollama”- Install Ollama
- Pull a model:
Terminal window ollama pull llama3.2# orollama pull qwen2.5-coder:7b - Ollama serves automatically at
http://localhost:11434
# Auto-detect Ollama modelspython apps/cli/modellens.py run --provider ollama --quick
# Specific Ollama modelpython apps/cli/modellens.py run --provider ollama --models llama3.2:latestModel naming
Section titled “Model naming”Ollama models include tags: llama3.2:latest, qwen2.5-coder:7b. When passing model names via CLI, include the tag: --models llama3.2:latest
Python API
Section titled “Python API”from providers.ollama import OllamaClientfrom providers.base import RunRequest
ollama = OllamaClient( base_url="http://localhost:11434", api_key="ollama", model_name="llama3.2:latest",)
# Health check (tries /v1/models, falls back to /api/tags)if ollama.health_check(): print("Ollama is running!")
# Run a promptresult = ollama.run_prompt(RunRequest( prompt="Explain Kubernetes in one sentence.", model="llama3.2:latest",))print(f"TTFT: {result.ttft_ms:.0f}ms, Tokens/sec: {result.tokens_per_second:.1f}")Open WebUI
Section titled “Open WebUI”- Install Open WebUI (e.g., via Docker or pip)
- Open WebUI serves an OpenAI-compatible API at
http://localhost:3000/api/v1
python apps/cli/modellens.py run --provider open-webui --quickpython apps/cli/modellens.py run --provider open-webui --models my-modelPython API
Section titled “Python API”from providers.openwebui import OpenWebUIClient
client = OpenWebUIClient( base_url="http://localhost:3000", api_key="open-webui", model_name="my-model",)- Download Jan
- Load a model
- Jan serves an OpenAI-compatible API at
http://localhost:1337/v1
python apps/cli/modellens.py run --provider jan --quickpython apps/cli/modellens.py run --provider jan --models my-modelPython API
Section titled “Python API”from providers.jan import JanClient
client = JanClient( base_url="http://localhost:1337", api_key="jan", model_name="my-model",)llama.cpp
Section titled “llama.cpp”- Build or download llama.cpp
- Run the server:
Terminal window ./server -m models/my-model.gguf --host 0.0.0.0 --port 8080
python apps/cli/modellens.py run --provider llama.cpp --quickpython apps/cli/modellens.py run --provider llama.cpp --models llama-3.2-7b- Install vLLM:
Terminal window pip install vllm - Start the server:
Terminal window python -m vllm.entrypoints.openai.api_server --model path/to/model --port 8000
python apps/cli/modellens.py run --provider vllm --quickpython apps/cli/modellens.py run --provider vllm --models my-modelProviderAdapter interface
Section titled “ProviderAdapter interface”All providers implement this interface (see packages/providers/base.py):
class ProviderAdapter(ABC): name: str default_port: int
def list_models(self) -> List[Model]: ... def health_check(self) -> bool: ... def run_prompt(self, request: RunRequest) -> RunResult: ... def chat_completion(self, messages, temperature, max_tokens, top_p, stream) -> tuple[str, object]: ... def collect_metrics(self) -> ProviderMetrics: ...Shared data types
Section titled “Shared data types”| Type | Fields |
|---|---|
Model | id, name, provider, parameters, quantization, size_bytes |
RunRequest | prompt, model, temperature, max_tokens, top_p, system_prompt, stream |
RunResult | response, model, provider, ttft_ms, total_time_ms, tokens_per_second, tokens |
APICallMetrics | ttft, total_time, tokens_per_second, total_tokens, prompt_tokens, completion_tokens |
ProviderMetrics | cpu_percent, ram_used_mb, ram_total_mb, gpu_available, gpu_used_mb, swap_used_mb |
Adding a new provider
Section titled “Adding a new provider”See the Provider Contract for the full implementation checklist.
See also
Section titled “See also”- Provider Contract — formal interface specification
- Architecture — how providers fit into the system