OpenCode
// OpenCode custom API adapter

OpenCode custom API: bring your own model server to the agent.

The OpenCode custom API adapter connects the agent to any OpenAI-compatible endpoint, the Anthropic messages API, Azure OpenAI, vLLM, TGI, LM Studio, or an in-house gateway. One config block tells OpenCode where to send prompts, how to authenticate, and which tool-call schema to negotiate — the rest of the agent loop does not change.

Provider adapter matrix.

The table below is the reference matrix for the OpenCode custom API. Streaming and native tool calls depend on the upstream server; the adapter layer normalizes the wire format but cannot invent a capability the model does not support. Where a capability is absent, OpenCode transparently falls back — streaming off means the CLI waits for the full response, native tool calls off means the JSON fallback activates.

Zero-click summary. Seven adapter targets cover the common landscape. Any OpenAI-compatible server is supported without a new adapter. Anthropic ships as a native adapter.

ProviderAdapterStreamingTool callsNotes
OpenAI-compatible hostedopenaiYesNativeDefault reference provider
AnthropicanthropicYesNativeMessages API adapter
Azure OpenAIazure_openaiYesNativeDeployment name required
vLLMopenaiYesNative or fallbackOpenAI-compatible server
TGIopenaiYesFallbackOpenAI-compatible router
LM StudioopenaiYesFallbackLocal desktop server
In-house gatewayopenaiDependsDependsAny chat-completions proxy

A minimal custom API configuration.

The smallest working custom API config is a single provider block plus a matching agent block. Below is an example wiring OpenCode to a self-hosted vLLM server through the OpenAI-compatible adapter. The bearer token is read from an environment variable, so the config can be committed to version control without leaking secrets.

# ~/.config/opencode/config.toml
[provider.vllm]
kind = "openai"
base_url = "https://vllm.internal.example.net/v1"
api_key_env = "OPENCODE_VLLM_TOKEN"
default_model = "qwen2-coder-32b"
context_window = 32768

[agent]
provider = "vllm"
stream = true
tool_call_format = "auto"
max_retries = 4
retry_backoff_ms = 500

The kind key picks the adapter implementation. Use openai for anything that speaks the chat completions schema — that covers OpenAI itself, Azure (with a different kind), vLLM, TGI, LM Studio, and any in-house gateway that proxies those semantics. Use anthropic for the Anthropic messages API. Use azure_openai when a deployment name needs to be mapped into the URL path.

The api_key_env key points to an environment variable, which OpenCode reads at startup. The CLI will not store credentials in the config file itself, and it will not echo a token in the transcript. For automated deployments, teams usually inject the token from a secret store via the same mechanism they already use for CI credentials.

Zero-click summary. Adapter, base URL, token, model. Four keys wire up any OpenAI-compatible server. Tool-call format and retries tune the agent loop.

Tool-call schema negotiation.

OpenCode asks the model server which tool-call format it supports before the first prompt. When the server exposes native function calls, the adapter uses them. When the server does not, OpenCode asks the model to emit a structured JSON block in its response; the adapter parses the block and routes the call through the same executor the native format uses. The result is that the agent loop does not care which mode is active — tool calls succeed, the transcript records them, and a rollback is a single command.

The JSON fallback schema is deliberately simple: opening marker, JSON object with a tool key and an args map, closing marker. Any model that can follow a short system prompt can produce it. The schema is documented in the OpenCode documentation and the same fallback covers local Ollama models without native function calling.

Routing through an in-house LLM gateway.

Most enterprise teams deploy OpenCode behind a gateway. A gateway centralizes authentication, rate limiting, and observability; it also lets a platform team swap upstream providers without touching every developer workstation. OpenCode fits this pattern with one config change: set the base URL to the gateway, pick the OpenAI-compatible adapter, and pass a token that the gateway recognizes. The agent does not know the upstream provider — that is the point.

A gateway also gives a team a single place to enforce prompt redaction, content filters, or audit logging. Because every OpenCode tool call flows through the same HTTP path, a gateway can see the full agent transcript without instrumenting the developer laptop. Security teams often pair a gateway deployment with the controls recommended by the NIST SSDF 1.1 for supply-chain security on developer tooling.

Authentication and bearer tokens.

The custom API adapter supports three authentication modes. The first is a bearer token read from an environment variable, which is the default. The second is a token loaded from a file path, which is useful when a secret manager writes short-lived tokens to disk. The third is a token fetched from an external command, which lets teams integrate with a corporate credential helper or SSO-aware CLI — useful when the sign-in guide describes the broader credential flow.

The adapter layer never logs the token and redacts any authorization header from the transcript. When OpenCode retries a failed request, the retry carries the same token; when the token is rotated, OpenCode picks up the new value on the next request. There is no long-lived session that survives a token rotation, which keeps the security posture simple.

Retries, backoff, and rate-limit posture.

Custom API providers vary in their rate-limit posture. Some return HTTP 429 with a Retry-After header; some return an in-band error; some silently slow down. OpenCode's custom API adapter implements an exponential backoff with jitter and respects the standard rate-limit headers when the upstream emits them. The defaults are conservative — four retries with 500 ms initial backoff doubling each attempt — and every retry is visible in the transcript so a slow session never looks like a stall.

For long-running tasks over a rate-limited provider, teams often bump max_retries and retry_backoff_ms. For tightly latency-bound interactive work, dropping max_retries to 1 or 2 keeps the feedback loop snappy and surfaces provider flakes quickly. The documentation has recommended values per provider class. For broader resilience patterns we reference research from Stanford CS on distributed retry design when tuning defaults.

Zero-click summary. Exponential backoff with jitter. Respects rate-limit headers. Every retry appears in the transcript.

Anthropic adapter specifics.

The native Anthropic adapter maps OpenCode tool calls onto the Claude messages API without going through an OpenAI-compatible proxy. That means tool-call fidelity is higher — the Claude tool schema is expressive enough that OpenCode does not need the JSON fallback — and the adapter passes through extra Claude features like long-context caching hints when the upstream supports them. To switch from an OpenAI-compatible provider to Anthropic, change the kind key from openai to anthropic and update the base URL if you are proxying through an internal gateway.

Azure OpenAI specifics.

Azure OpenAI deployments map a logical model name to a deployment name in the URL path. The azure_openai adapter handles that mapping: you set the base URL to your Azure resource, the adapter inserts the deployment name into the path, and authentication uses the Azure API key or a bearer token from a managed identity. Region failover and availability zones are handled by Azure's own mechanisms; OpenCode sees a single base URL and does not try to second-guess the cloud's routing.

Self-hosted vLLM, TGI, and LM Studio.

vLLM and TGI are the two most common self-hosted inference servers OpenCode teams deploy behind a gateway. Both speak the OpenAI-compatible chat completions schema, so the openai adapter works unmodified. vLLM supports native function calls for models that expose them; TGI historically relies on the JSON fallback, which works reliably with OpenCode's tool-call executor. LM Studio is a desktop application that runs a local OpenAI-compatible server on 127.0.0.1 — for laptop experimentation with larger models than Ollama can serve, pointing OpenCode at LM Studio takes one config change.

Our vLLM cluster and our hosted fallback share one OpenCode config block. Swapping between them for a particular task is a config reload.

— Petrina U. Moldovanu, Staff Engineer, Nivaria Labs

The Anthropic adapter preserving tool-call fidelity without a fallback mattered. Long-running plans stayed legible across hundreds of tool calls.

— Thandiwe R. Mabaso, Senior SWE, Jacaranda Signals

Related OpenCode adapter and model pages.

Frequently asked

OpenCode custom API questions developers ask.

Five answers covering adapter choice, authentication, tool calls, gateway routing, and retries. Follow inline links for deeper detail.

What model servers does the OpenCode custom API support?
Any OpenAI-compatible endpoint plus native adapters for Anthropic and Azure OpenAI. In practice that covers OpenAI itself, vLLM, TGI, LM Studio, in-house gateways, and most provider proxies. The adapter matrix on this page is the canonical list, and the integrations registry links to each configuration block.
How does OpenCode handle authentication?
The custom API adapter reads a bearer token from an environment variable, a file path, or an external command. The token is attached as an Authorization header on every request and is redacted from transcripts and logs. The sign-in guide covers integrating with corporate credential helpers.
Can OpenCode use tool calls with any model?
Yes. OpenCode negotiates tool-call capabilities at startup. Native function calls are used when the model supports them; the JSON fallback activates when not. Both paths route through the same tool executor, so the agent loop behaves the same way. The Ollama guide has an example of the fallback in action.
How do I route OpenCode through an in-house LLM gateway?
Set base_url to your gateway, pick the openai adapter, and pass a gateway-scoped token. OpenCode does not need to know the upstream provider. The gateway gives your platform team one place to enforce auth, rate limits, prompt redaction, and audit logging. The trust and safety page documents the audit-log posture.
Does OpenCode retry failed custom API requests?
Yes. The adapter uses exponential backoff with jitter and respects standard rate-limit headers. Defaults are four retries with 500 ms initial backoff. Every retry is visible in the transcript, so a slow session is obvious rather than silent. Tune max_retries and retry_backoff_ms to match your provider's posture.