Skip to content

Providers

Model Lens supports 6 local LLM providers through a unified ProviderAdapter interface. All providers communicate via OpenAI-compatible /v1/chat/completions endpoints.

ProviderStatusDefault URLMin version
LM Studio✅ Stablehttp://localhost:1234/v10.3.x+
Ollama✅ Stablehttp://localhost:11434/v10.1.28+
Open WebUI✅ Stablehttp://localhost:3000/api/v10.5.x+
Jan✅ Stablehttp://localhost:1337/v10.5.x+
llama.cpp✅ Stablehttp://localhost:8080/v1b4200+
vLLM✅ Stablehttp://localhost:8000/v10.6.x+
  1. Download LM Studio
  2. Load a model (e.g., Qwen 3.5, Gemma 4)
  3. Go to Developer tab → enable Local API Server
  4. Start the server (default: http://localhost:1234)
Terminal window
# Auto-detect LM Studio models
python apps/cli/modellens.py run --quick
# Explicit model
python apps/cli/modellens.py run --provider lm-studio --models qwen3.5-9b-coder
from providers.openai_compatible import OpenAICompatibleProvider
client = OpenAICompatibleProvider(
base_url="http://localhost:1234/v1",
api_key="lm-studio",
model_name="qwen3.5-9b-coder",
timeout=120,
max_retries=3,
)
response, metrics = client.chat_completion([
{"role": "user", "content": "Explain TypeScript generics."}
])
# metrics.ttft, metrics.tokens_per_second, etc.
Terminal window
curl http://localhost:1234/v1/models

  1. Install Ollama
  2. Pull a model:
    Terminal window
    ollama pull llama3.2
    # or
    ollama pull qwen2.5-coder:7b
  3. Ollama serves automatically at http://localhost:11434
Terminal window
# Auto-detect Ollama models
python apps/cli/modellens.py run --provider ollama --quick
# Specific Ollama model
python apps/cli/modellens.py run --provider ollama --models llama3.2:latest

Ollama models include tags: llama3.2:latest, qwen2.5-coder:7b. When passing model names via CLI, include the tag: --models llama3.2:latest

from providers.ollama import OllamaClient
from providers.base import RunRequest
ollama = OllamaClient(
base_url="http://localhost:11434",
api_key="ollama",
model_name="llama3.2:latest",
)
# Health check (tries /v1/models, falls back to /api/tags)
if ollama.health_check():
print("Ollama is running!")
# Run a prompt
result = ollama.run_prompt(RunRequest(
prompt="Explain Kubernetes in one sentence.",
model="llama3.2:latest",
))
print(f"TTFT: {result.ttft_ms:.0f}ms, Tokens/sec: {result.tokens_per_second:.1f}")

  1. Install Open WebUI (e.g., via Docker or pip)
  2. Open WebUI serves an OpenAI-compatible API at http://localhost:3000/api/v1
Terminal window
python apps/cli/modellens.py run --provider open-webui --quick
python apps/cli/modellens.py run --provider open-webui --models my-model
from providers.openwebui import OpenWebUIClient
client = OpenWebUIClient(
base_url="http://localhost:3000",
api_key="open-webui",
model_name="my-model",
)

  1. Download Jan
  2. Load a model
  3. Jan serves an OpenAI-compatible API at http://localhost:1337/v1
Terminal window
python apps/cli/modellens.py run --provider jan --quick
python apps/cli/modellens.py run --provider jan --models my-model
from providers.jan import JanClient
client = JanClient(
base_url="http://localhost:1337",
api_key="jan",
model_name="my-model",
)

  1. Build or download llama.cpp
  2. Run the server:
    Terminal window
    ./server -m models/my-model.gguf --host 0.0.0.0 --port 8080
Terminal window
python apps/cli/modellens.py run --provider llama.cpp --quick
python apps/cli/modellens.py run --provider llama.cpp --models llama-3.2-7b

  1. Install vLLM:
    Terminal window
    pip install vllm
  2. Start the server:
    Terminal window
    python -m vllm.entrypoints.openai.api_server --model path/to/model --port 8000
Terminal window
python apps/cli/modellens.py run --provider vllm --quick
python apps/cli/modellens.py run --provider vllm --models my-model

All providers implement this interface (see packages/providers/base.py):

class ProviderAdapter(ABC):
name: str
default_port: int
def list_models(self) -> List[Model]: ...
def health_check(self) -> bool: ...
def run_prompt(self, request: RunRequest) -> RunResult: ...
def chat_completion(self, messages, temperature, max_tokens, top_p, stream) -> tuple[str, object]: ...
def collect_metrics(self) -> ProviderMetrics: ...
TypeFields
Modelid, name, provider, parameters, quantization, size_bytes
RunRequestprompt, model, temperature, max_tokens, top_p, system_prompt, stream
RunResultresponse, model, provider, ttft_ms, total_time_ms, tokens_per_second, tokens
APICallMetricsttft, total_time, tokens_per_second, total_tokens, prompt_tokens, completion_tokens
ProviderMetricscpu_percent, ram_used_mb, ram_total_mb, gpu_available, gpu_used_mb, swap_used_mb

See the Provider Contract for the full implementation checklist.