OpenCode
// OpenCode Ollama adapter

OpenCode Ollama: run the agent against a local model on your laptop.

The OpenCode Ollama adapter turns a local Ollama server into a first-class OpenCode backend. Pull a model with ollama pull, point OpenCode at the Ollama socket, and you have an agent that reads your repository, edits files, and runs tests without emitting a single packet to a vendor cloud. Everything in this guide assumes a laptop, an offline-capable workstation, or an air-gapped dev container.

Installing Ollama and pulling your first model.

Ollama is a small local server that downloads quantized model weights, runs inference on CPU or GPU depending on your hardware, and exposes an HTTP API on 127.0.0.1:11434. Install Ollama from your package manager or the Ollama distribution for your platform, then verify the server is running. Once it is up, pull a model — for a 16 GB laptop the small tier is a safe start, for a 32 GB laptop the medium tier is comfortable, and for a workstation or Apple Silicon with unified memory the large tier is in reach.

With a model pulled, OpenCode picks it up automatically. When you run the agent with the Ollama provider selected, the CLI lists the models Ollama has available, you pick one, and the first prompt runs locally. The CLI install guide covers the OpenCode side of the setup; the Ollama side is a standard ollama pull command. No OpenCode-specific model format, no custom quantization — the adapter uses whatever Ollama gives it.

Recommended Ollama models for OpenCode.

The table below is the canonical recommendation list the OpenCode maintainers keep current. Specific model names move around as new quantizations ship, but the tiers — small, medium, large — are stable. Match the tier to your RAM budget first and your patience second; a small-tier model on a 16 GB laptop is faster than a medium-tier model that swaps to disk.

Zero-click summary. Three tiers. Small fits a 16 GB laptop. Medium fits 32 GB. Large needs 48+ GB of fast RAM or Apple Silicon unified memory.

ModelQuantRAMSpeedUse case
Small tier (7B class)Q4_K_M8 GBFastEveryday edits, small refactors
Small tier (8B class)Q5_K_M10 GBFastCode completion, inline fixes
Medium tier (14B class)Q4_K_M16 GBModerateMulti-file changes
Medium tier (20B class)Q4_K_M22 GBModeratePlan mode on real repos
Large tier (32B class)Q4_K_M32 GBSlowerHard refactors, long contexts
Large tier (70B class)Q4_K_M48 GBSlowCross-module reasoning

Pointing OpenCode at the local Ollama socket.

The OpenCode config lives at ~/.config/opencode/config.toml on macOS and Linux, or %APPDATA%\OpenCode\config.toml on Windows. To wire up the Ollama adapter, add a provider block keyed on Ollama and point it at the Ollama HTTP endpoint. OpenCode fetches the available models from Ollama at startup, so you do not list them manually in the config — the CLI picks them up from ollama list.

Below is a minimal Ollama provider block. Adjust the model name to match a model you have pulled, and optionally set a default context window if the model supports something other than the library default.

# ~/.config/opencode/config.toml
[provider.ollama]
kind = "ollama"
base_url = "http://127.0.0.1:11434"
default_model = "opencode-small-q4"
context_window = 8192

[agent]
provider = "ollama"
tool_call_format = "json_fallback"
stream = true

The tool_call_format key is the one most users forget. OpenCode prefers native function calls when the model supports them, but most local models surface tool calls as a structured JSON block in the response. Setting tool_call_format = "json_fallback" tells the adapter to parse that block and route it through the same tool executor the native calls use. The custom API guide documents the tool-call schema for teams that want to audit it.

Zero-click summary. Three keys wire up Ollama: kind, base_url, default_model. JSON fallback covers models without native function calls.

Latency posture and when local models are fast enough.

Local inference has a different latency shape than a hosted API. A hosted model returns first tokens in 100–300 ms but the round trip includes TLS, queueing, and rate-limit hops. A local Ollama model returns first tokens in 500–1500 ms on a mid-range laptop but every subsequent token is as fast as your CPU or GPU can produce it. For interactive agent work, the experience often feels similar because OpenCode streams tokens back into the inline diff as they arrive.

The tasks where local models shine: small refactors, inline fixes, code completion, short plan-and-apply loops. The tasks where hosted models still win: long-context reasoning over a 200-file monorepo, cross-module refactors with tricky invariants, and ambiguous natural-language specifications. Mixing tiers is a valid pattern — run a small local model for the routine work and reach for a hosted model when the agent tells you the horizon is long. OpenCode supports multiple providers in one config so the switch is a single command.

Zero-click summary. Local models feel interactive for short tasks. Hosted models still win for long contexts. Mixing tiers is fine.

Tool-call JSON fallback for models without native function calls.

Not every open-weights model exposes native tool calls the way a frontier hosted model does. OpenCode handles that by asking the model to emit a structured JSON block whenever it wants to call a tool, then parsing that block on the adapter side. The format is documented so teams writing custom prompts can inspect it and the NIST software quality group guidance on structured inputs applies cleanly to the schema.

The fallback format is intentionally boring: an opening marker, a JSON object with a tool name and an args map, and a closing marker. Any model that can follow a system prompt can produce it. If a model struggles, the usual fix is a shorter system prompt and a more explicit example in the OpenCode tool descriptor. The documentation covers the exact format.

Offline installs and air-gapped workstations.

OpenCode with Ollama is one of the few coding-agent combinations that actually works air-gapped. The OpenCode CLI is a single static binary, so you can vendor it onto an internal package mirror. Ollama model weights can be pulled on a connected workstation, exported as blobs, and imported on the air-gapped machine. Once both are in place, set telemetry off in OpenCode, start Ollama, and the CLI never reaches for the network.

For enterprise deployments, the common pattern is to mirror the OpenCode release bundle, a curated set of Ollama model weights, and an internal signing key through the endpoint management system. The trust and safety page documents the SBOM and signing attestations that make a mirrored deployment auditable. Guidance on OSS supply-chain posture from CMU research on reproducible builds informed our mirror design.

Troubleshooting the Ollama adapter.

The most common failure mode is a pulled model that OpenCode cannot see. That usually means Ollama is running under a different user than the OpenCode CLI, or the base URL in the OpenCode config points at a stopped Ollama instance. Run curl http://127.0.0.1:11434/api/tags — if it returns a JSON list, OpenCode will see the same list; if it fails, restart Ollama and re-check. The second most common failure is a slow first response: that is usually the model warming up and is normal on the first prompt after a restart.

The third issue engineers hit is a context overflow — a long repository does not fit in the small-tier model's context window. The fix is either a larger-tier model, a tighter selection in the OpenCode VSCode extension, or a context_window override in the config if Ollama supports it for your model. The documentation covers the overflow strategy.

The JSON tool-call fallback is the unsung hero. It meant we could run open-weight models that don't ship native function calling without writing glue.

— Sibylle D. Achterberg, VP Platform, Oakcatalyst

On a 64 GB Apple Silicon laptop the medium tier is fast enough that I default to local for all the routine work. I reach for hosted only when the task horizon is long.

— Korvin N. Hayasaki, Frontend Engineer, Elytra Forge

Related OpenCode model and adapter guides.

Frequently asked

OpenCode Ollama questions developers ask.

Five answers covering setup, model choice, offline workflows, and tool-call fallbacks. Follow inline links for full detail.

How do I point OpenCode at a local Ollama server?
Install Ollama, run ollama pull for the model you want, and add a provider block to ~/.config/opencode/config.toml with kind = "ollama" and base_url = "http://127.0.0.1:11434". OpenCode auto-detects the available models. The documentation has a full walkthrough.
Which Ollama model should I pull for OpenCode on my laptop?
Match the tier to your RAM. 16 GB laptops run the small tier comfortably, 32 GB laptops handle the medium tier, and 48+ GB workstations or Apple Silicon with unified memory reach the large tier. The recommended models table on this page is the OpenCode-maintained shortlist.
Does OpenCode work with models that do not support native tool calls?
Yes. Set tool_call_format = "json_fallback" in the agent block of your OpenCode config and the adapter will parse a structured JSON block from the model response, routing it through the same tool executor as native calls. The custom API guide documents the fallback schema.
Can OpenCode run fully offline with Ollama?
Yes. With Ollama running locally and OpenCode telemetry disabled, no prompts or code leave the machine. The trust and safety page documents the telemetry policy, and the CLI install guide covers offline installs from a vendored mirror.
How much RAM does OpenCode with Ollama need?
The OpenCode CLI uses under 200 MB resident. The local model is the dominant RAM cost — roughly 8 GB for small-tier, 16 GB for medium-tier, and 48+ GB for large-tier quantized weights. Apple Silicon laptops benefit from unified memory since Ollama can share RAM with the GPU directly.