Skip to content

Provider Contract

Canonical contract for provider adapters. Every provider in Model Lens implements this interface.

Defined in packages/providers/base.py.

AttributeTypeRequiredDescription
namestrCanonical provider name (e.g. "ollama", "vllm")
default_portintDefault port for the provider’s API

Check if the provider is reachable and healthy.

  • Must not raise exceptions — all failures return False
  • Should time out quickly (≤5 seconds)
  • Catch requests.ConnectionError and requests.Timeout specifically

Return all models available on this provider.

  • Returns empty list on failure (never raises)
  • Model metadata: id, name, provider, parameters, quantization, size_bytes

run_prompt(request: RunRequest) -> RunResult

Section titled “run_prompt(request: RunRequest) -> RunResult”

Run a single prompt and return structured results.

chat_completion(messages, temperature, max_tokens, top_p, stream) -> Tuple[str, APICallMetrics]

Section titled “chat_completion(messages, temperature, max_tokens, top_p, stream) -> Tuple[str, APICallMetrics]”

OpenAI-compatible chat completion. The core execution method.

Collect hardware/performance metrics from the provider process.

  • Has a default implementation in the ABC (psutil-based)
  • Override for provider-specific metrics (GPU, etc.)

FieldTypeDescription
idstrUnique model identifier
namestrHuman-readable name
providerstrProvider serving this model
parametersstrParameter count / tag (e.g. "7B", "latest")
quantizationstrQuantization level (e.g. "Q4_K_M")
size_bytesintModel file size in bytes
FieldTypeDefaultDescription
promptstr(required)The user prompt
modelstr(required)Model to use
temperaturefloat0.0Sampling temperature
max_tokensint4096Max completion tokens
top_pfloat1.0Nucleus sampling
system_promptstrNoneSystem prompt
streamboolFalseEnable streaming
FieldTypeDescription
responsestrFull response text
modelstrModel used
providerstrProvider used
ttft_msfloatTime to first token (ms)
total_time_msfloatTotal execution time (ms)
tokens_per_secondfloatGeneration throughput
prompt_tokensintPrompt token count
completion_tokensintCompletion token count
total_tokensintTotal tokens used
FieldTypeDescription
ttftfloatTime to first token (seconds)
total_timefloatTotal generation time (seconds)
tokens_per_secondfloatThroughput
total_tokensintTotal tokens
prompt_tokensintPrompt tokens
completion_tokensintCompletion tokens
FieldTypeDescription
cpu_percentfloatCPU usage percentage
ram_used_mbfloatRAM used (MB)
ram_total_mbfloatTotal system RAM (MB)
gpu_availableboolWhether GPU is detected
gpu_used_mbfloatGPU memory used (MB)
swap_used_mbfloatSwap used (MB)

To add a new provider:

  1. Create packages/providers/<your_provider>.py
  2. Implement ProviderAdapter:
    from .base import ProviderAdapter, Model, RunRequest, RunResult, APICallMetrics
    class YourProvider(ProviderAdapter):
    name = "your-provider"
    default_port = 8080
    def health_check(self) -> bool: ...
    def list_models(self) -> List[Model]: ...
    def run_prompt(self, request: RunRequest) -> RunResult: ...
    def chat_completion(self, messages, ...) -> tuple[str, APICallMetrics]: ...
  3. Register in packages/providers/__init__.py
  4. Add provider entry in apps/cli/commands/utils.py (PROVIDER_CONFIG)
  5. Add --provider choice in apps/cli/commands/run.py
  6. Add auto-detection probe in _resolve_provider()
  7. Add tests in tests/test_provider_clients.py

All providers MUST use these helpers from packages/providers/base.py:

FunctionPurposeExample
normalize_base_url(url)Strip trailing / for storage"http://host:8000/v1/""http://host:8000/v1"
get_root_url(url)Extract scheme://netloc"http://host:8000/v1""http://host:8000"
url_join(base, path)Safe URL path joiningurl_join("http://host:8000/", "health")"http://host:8000/health"

Banned patterns: .rstrip("/"), .removesuffix("/v1"), f"{base}/{path}" string concatenation.


Providers that extend OpenAICompatibleProvider automatically emit:

  • TokenGeneratedEvent per streaming token
  • CompletionEvent after success/failure
  • ErrorEvent on failure

Providers must accept event_bus and event_source in their constructor (via OpenAICompatibleProvider.__init__).