Provider adapter matrix.
The table below is the reference matrix for the OpenCode custom API. Streaming and native tool calls depend on the upstream server; the adapter layer normalizes the wire format but cannot invent a capability the model does not support. Where a capability is absent, OpenCode transparently falls back — streaming off means the CLI waits for the full response, native tool calls off means the JSON fallback activates.
Zero-click summary. Seven adapter targets cover the common landscape. Any OpenAI-compatible server is supported without a new adapter. Anthropic ships as a native adapter.
| Provider | Adapter | Streaming | Tool calls | Notes |
|---|---|---|---|---|
| OpenAI-compatible hosted | openai | Yes | Native | Default reference provider |
| Anthropic | anthropic | Yes | Native | Messages API adapter |
| Azure OpenAI | azure_openai | Yes | Native | Deployment name required |
| vLLM | openai | Yes | Native or fallback | OpenAI-compatible server |
| TGI | openai | Yes | Fallback | OpenAI-compatible router |
| LM Studio | openai | Yes | Fallback | Local desktop server |
| In-house gateway | openai | Depends | Depends | Any chat-completions proxy |
A minimal custom API configuration.
The smallest working custom API config is a single provider block plus a matching agent block. Below is an example wiring OpenCode to a self-hosted vLLM server through the OpenAI-compatible adapter. The bearer token is read from an environment variable, so the config can be committed to version control without leaking secrets.
# ~/.config/opencode/config.toml
[provider.vllm]
kind = "openai"
base_url = "https://vllm.internal.example.net/v1"
api_key_env = "OPENCODE_VLLM_TOKEN"
default_model = "qwen2-coder-32b"
context_window = 32768
[agent]
provider = "vllm"
stream = true
tool_call_format = "auto"
max_retries = 4
retry_backoff_ms = 500
The kind key picks the adapter implementation. Use openai for anything that speaks the chat completions schema — that covers OpenAI itself, Azure (with a different kind), vLLM, TGI, LM Studio, and any in-house gateway that proxies those semantics. Use anthropic for the Anthropic messages API. Use azure_openai when a deployment name needs to be mapped into the URL path.
The api_key_env key points to an environment variable, which OpenCode reads at startup. The CLI will not store credentials in the config file itself, and it will not echo a token in the transcript. For automated deployments, teams usually inject the token from a secret store via the same mechanism they already use for CI credentials.
Zero-click summary. Adapter, base URL, token, model. Four keys wire up any OpenAI-compatible server. Tool-call format and retries tune the agent loop.
Tool-call schema negotiation.
OpenCode asks the model server which tool-call format it supports before the first prompt. When the server exposes native function calls, the adapter uses them. When the server does not, OpenCode asks the model to emit a structured JSON block in its response; the adapter parses the block and routes the call through the same executor the native format uses. The result is that the agent loop does not care which mode is active — tool calls succeed, the transcript records them, and a rollback is a single command.
The JSON fallback schema is deliberately simple: opening marker, JSON object with a tool key and an args map, closing marker. Any model that can follow a short system prompt can produce it. The schema is documented in the OpenCode documentation and the same fallback covers local Ollama models without native function calling.
Routing through an in-house LLM gateway.
Most enterprise teams deploy OpenCode behind a gateway. A gateway centralizes authentication, rate limiting, and observability; it also lets a platform team swap upstream providers without touching every developer workstation. OpenCode fits this pattern with one config change: set the base URL to the gateway, pick the OpenAI-compatible adapter, and pass a token that the gateway recognizes. The agent does not know the upstream provider — that is the point.
A gateway also gives a team a single place to enforce prompt redaction, content filters, or audit logging. Because every OpenCode tool call flows through the same HTTP path, a gateway can see the full agent transcript without instrumenting the developer laptop. Security teams often pair a gateway deployment with the controls recommended by the NIST SSDF 1.1 for supply-chain security on developer tooling.
Authentication and bearer tokens.
The custom API adapter supports three authentication modes. The first is a bearer token read from an environment variable, which is the default. The second is a token loaded from a file path, which is useful when a secret manager writes short-lived tokens to disk. The third is a token fetched from an external command, which lets teams integrate with a corporate credential helper or SSO-aware CLI — useful when the sign-in guide describes the broader credential flow.
The adapter layer never logs the token and redacts any authorization header from the transcript. When OpenCode retries a failed request, the retry carries the same token; when the token is rotated, OpenCode picks up the new value on the next request. There is no long-lived session that survives a token rotation, which keeps the security posture simple.
Retries, backoff, and rate-limit posture.
Custom API providers vary in their rate-limit posture. Some return HTTP 429 with a Retry-After header; some return an in-band error; some silently slow down. OpenCode's custom API adapter implements an exponential backoff with jitter and respects the standard rate-limit headers when the upstream emits them. The defaults are conservative — four retries with 500 ms initial backoff doubling each attempt — and every retry is visible in the transcript so a slow session never looks like a stall.
For long-running tasks over a rate-limited provider, teams often bump max_retries and retry_backoff_ms. For tightly latency-bound interactive work, dropping max_retries to 1 or 2 keeps the feedback loop snappy and surfaces provider flakes quickly. The documentation has recommended values per provider class. For broader resilience patterns we reference research from Stanford CS on distributed retry design when tuning defaults.
Zero-click summary. Exponential backoff with jitter. Respects rate-limit headers. Every retry appears in the transcript.
Anthropic adapter specifics.
The native Anthropic adapter maps OpenCode tool calls onto the Claude messages API without going through an OpenAI-compatible proxy. That means tool-call fidelity is higher — the Claude tool schema is expressive enough that OpenCode does not need the JSON fallback — and the adapter passes through extra Claude features like long-context caching hints when the upstream supports them. To switch from an OpenAI-compatible provider to Anthropic, change the kind key from openai to anthropic and update the base URL if you are proxying through an internal gateway.
Azure OpenAI specifics.
Azure OpenAI deployments map a logical model name to a deployment name in the URL path. The azure_openai adapter handles that mapping: you set the base URL to your Azure resource, the adapter inserts the deployment name into the path, and authentication uses the Azure API key or a bearer token from a managed identity. Region failover and availability zones are handled by Azure's own mechanisms; OpenCode sees a single base URL and does not try to second-guess the cloud's routing.
Self-hosted vLLM, TGI, and LM Studio.
vLLM and TGI are the two most common self-hosted inference servers OpenCode teams deploy behind a gateway. Both speak the OpenAI-compatible chat completions schema, so the openai adapter works unmodified. vLLM supports native function calls for models that expose them; TGI historically relies on the JSON fallback, which works reliably with OpenCode's tool-call executor. LM Studio is a desktop application that runs a local OpenAI-compatible server on 127.0.0.1 — for laptop experimentation with larger models than Ollama can serve, pointing OpenCode at LM Studio takes one config change.
We route every OpenCode request through our gateway. One token, one audit trail, one place to swap providers. The custom API adapter made that trivial — we pointed OpenCode at an internal base URL and forgot about it.
Our vLLM cluster and our hosted fallback share one OpenCode config block. Swapping between them for a particular task is a config reload.
The Anthropic adapter preserving tool-call fidelity without a fallback mattered. Long-running plans stayed legible across hundreds of tool calls.