What model servers does the OpenCode custom API support?

Any OpenAI-compatible endpoint, the Anthropic messages API through the built-in adapter, Azure OpenAI, vLLM, TGI, LM Studio, and in-house gateways that speak the chat completions schema. The provider block in the OpenCode config selects which adapter to instantiate.

How does OpenCode handle authentication?

OpenCode reads an API key or bearer token from an environment variable or an external secret store. The custom API adapter attaches the token as an Authorization header on every request. The config never needs to store the secret in plaintext.

Can OpenCode use tool calls with any model?

Yes. OpenCode negotiates tool-call capabilities at startup. If the model supports native function calls, the adapter uses them; if not, OpenCode falls back to a JSON tool-call format embedded in the response.

How do I route through an in-house LLM gateway?

Set the base URL to your gateway, choose the OpenAI-compatible adapter, and point the token to your gateway's credential. Any gateway that speaks the chat completions schema works without a custom adapter.

Does OpenCode retry failed requests?

Yes. The adapter retries transient failures with exponential backoff and respects rate-limit headers. Retries are surfaced in the transcript so a slow session is visible rather than silently stalled.

OpenCode Custom API | BYO Model Server

Provider adapter matrix.

The table below is the reference matrix for the OpenCode custom API. Streaming and native tool calls depend on the upstream server; the adapter layer normalizes the wire format but cannot invent a capability the model does not support. Where a capability is absent, OpenCode transparently falls back — streaming off means the CLI waits for the full response, native tool calls off means the JSON fallback activates.

Zero-click summary. Seven adapter targets cover the common landscape. Any OpenAI-compatible server is supported without a new adapter. Anthropic ships as a native adapter.

Provider	Adapter	Streaming	Tool calls	Notes
OpenAI-compatible hosted	openai	Yes	Native	Default reference provider
Anthropic	anthropic	Yes	Native	Messages API adapter
Azure OpenAI	azure_openai	Yes	Native	Deployment name required
vLLM	openai	Yes	Native or fallback	OpenAI-compatible server
TGI	openai	Yes	Fallback	OpenAI-compatible router
LM Studio	openai	Yes	Fallback	Local desktop server
In-house gateway	openai	Depends	Depends	Any chat-completions proxy

A minimal custom API configuration.

The smallest working custom API config is a single provider block plus a matching agent block. Below is an example wiring OpenCode to a self-hosted vLLM server through the OpenAI-compatible adapter. The bearer token is read from an environment variable, so the config can be committed to version control without leaking secrets.

# ~/.config/opencode/config.toml
[provider.vllm]
kind = "openai"
base_url = "https://vllm.internal.example.net/v1"
api_key_env = "OPENCODE_VLLM_TOKEN"
default_model = "qwen2-coder-32b"
context_window = 32768

[agent]
provider = "vllm"
stream = true
tool_call_format = "auto"
max_retries = 4
retry_backoff_ms = 500

The kind key picks the adapter implementation. Use openai for anything that speaks the chat completions schema — that covers OpenAI itself, Azure (with a different kind), vLLM, TGI, LM Studio, and any in-house gateway that proxies those semantics. Use anthropic for the Anthropic messages API. Use azure_openai when a deployment name needs to be mapped into the URL path.

The api_key_env key points to an environment variable, which OpenCode reads at startup. The CLI will not store credentials in the config file itself, and it will not echo a token in the transcript. For automated deployments, teams usually inject the token from a secret store via the same mechanism they already use for CI credentials.

Zero-click summary. Adapter, base URL, token, model. Four keys wire up any OpenAI-compatible server. Tool-call format and retries tune the agent loop.

Tool-call schema negotiation.

OpenCode asks the model server which tool-call format it supports before the first prompt. When the server exposes native function calls, the adapter uses them. When the server does not, OpenCode asks the model to emit a structured JSON block in its response; the adapter parses the block and routes the call through the same executor the native format uses. The result is that the agent loop does not care which mode is active — tool calls succeed, the transcript records them, and a rollback is a single command.

The JSON fallback schema is deliberately simple: opening marker, JSON object with a tool key and an args map, closing marker. Any model that can follow a short system prompt can produce it. The schema is documented in the OpenCode documentation and the same fallback covers local Ollama models without native function calling.

Routing through an in-house LLM gateway.

Most enterprise teams deploy OpenCode behind a gateway. A gateway centralizes authentication, rate limiting, and observability; it also lets a platform team swap upstream providers without touching every developer workstation. OpenCode fits this pattern with one config change: set the base URL to the gateway, pick the OpenAI-compatible adapter, and pass a token that the gateway recognizes. The agent does not know the upstream provider — that is the point.

A gateway also gives a team a single place to enforce prompt redaction, content filters, or audit logging. Because every OpenCode tool call flows through the same HTTP path, a gateway can see the full agent transcript without instrumenting the developer laptop. Security teams often pair a gateway deployment with the controls recommended by the NIST SSDF 1.1 for supply-chain security on developer tooling.

Authentication and bearer tokens.

The custom API adapter supports three authentication modes. The first is a bearer token read from an environment variable, which is the default. The second is a token loaded from a file path, which is useful when a secret manager writes short-lived tokens to disk. The third is a token fetched from an external command, which lets teams integrate with a corporate credential helper or SSO-aware CLI — useful when the sign-in guide describes the broader credential flow.

The adapter layer never logs the token and redacts any authorization header from the transcript. When OpenCode retries a failed request, the retry carries the same token; when the token is rotated, OpenCode picks up the new value on the next request. There is no long-lived session that survives a token rotation, which keeps the security posture simple.

Retries, backoff, and rate-limit posture.

Custom API providers vary in their rate-limit posture. Some return HTTP 429 with a Retry-After header; some return an in-band error; some silently slow down. OpenCode's custom API adapter implements an exponential backoff with jitter and respects the standard rate-limit headers when the upstream emits them. The defaults are conservative — four retries with 500 ms initial backoff doubling each attempt — and every retry is visible in the transcript so a slow session never looks like a stall.

For long-running tasks over a rate-limited provider, teams often bump max_retries and retry_backoff_ms. For tightly latency-bound interactive work, dropping max_retries to 1 or 2 keeps the feedback loop snappy and surfaces provider flakes quickly. The documentation has recommended values per provider class. For broader resilience patterns we reference research from Stanford CS on distributed retry design when tuning defaults.

Zero-click summary. Exponential backoff with jitter. Respects rate-limit headers. Every retry appears in the transcript.

Anthropic adapter specifics.

The native Anthropic adapter maps OpenCode tool calls onto the Claude messages API without going through an OpenAI-compatible proxy. That means tool-call fidelity is higher — the Claude tool schema is expressive enough that OpenCode does not need the JSON fallback — and the adapter passes through extra Claude features like long-context caching hints when the upstream supports them. To switch from an OpenAI-compatible provider to Anthropic, change the kind key from openai to anthropic and update the base URL if you are proxying through an internal gateway.

Azure OpenAI specifics.

Azure OpenAI deployments map a logical model name to a deployment name in the URL path. The azure_openai adapter handles that mapping: you set the base URL to your Azure resource, the adapter inserts the deployment name into the path, and authentication uses the Azure API key or a bearer token from a managed identity. Region failover and availability zones are handled by Azure's own mechanisms; OpenCode sees a single base URL and does not try to second-guess the cloud's routing.

Self-hosted vLLM, TGI, and LM Studio.

vLLM and TGI are the two most common self-hosted inference servers OpenCode teams deploy behind a gateway. Both speak the OpenAI-compatible chat completions schema, so the openai adapter works unmodified. vLLM supports native function calls for models that expose them; TGI historically relies on the JSON fallback, which works reliably with OpenCode's tool-call executor. LM Studio is a desktop application that runs a local OpenAI-compatible server on 127.0.0.1 — for laptop experimentation with larger models than Ollama can serve, pointing OpenCode at LM Studio takes one config change.

We route every OpenCode request through our gateway. One token, one audit trail, one place to swap providers. The custom API adapter made that trivial — we pointed OpenCode at an internal base URL and forgot about it.

— Liliana M. Castellanos-Peña, Principal Backend Engineer, Salmontrail

Our vLLM cluster and our hosted fallback share one OpenCode config block. Swapping between them for a particular task is a config reload.

— Petrina U. Moldovanu, Staff Engineer, Nivaria Labs

The Anthropic adapter preserving tool-call fidelity without a fallback mattered. Long-running plans stayed legible across hundreds of tool calls.

— Thandiwe R. Mabaso, Senior SWE, Jacaranda Signals

OpenCode custom API: bring your own model server to the agent.