`@llm-ports/adapter-ollama`

Native adapter for Ollama, the local LLM daemon. Implements LLMPort, EmbeddingsPort, and adapter-level ModelManagement (list / pull / delete / health). All Ollama models default to zero-cost, unlimited budget.

Why this and not adapter-openai with `baseURL`?

Ollama exposes an OpenAI-compatible endpoint at http://localhost:11434/v1, so technically @llm-ports/adapter-openai works. The native adapter unlocks features the compatibility layer hides:

Model management: listModels, pullModel, deleteModel, checkHealth
Auto-pull on first use (optional)
Keep-alive control (VRAM retention)
Ollama-specific sampling (num_predict, num_ctx, etc.) via the SDK
Zero-cost pricing defaults (every Ollama model priced $0/1M)

Install

bash

pnpm add @llm-ports/core @llm-ports/adapter-ollama ollama

Configure

import { createRegistryFromEnv } from "@llm-ports/core";
import { createOllamaAdapter } from "@llm-ports/adapter-ollama";

const registry = createRegistryFromEnv({
  adapters: {
    ollama: createOllamaAdapter({
      baseURL: "http://localhost:11434",   // default
      autoPull: true,                       // pull missing models on first use
      keepAlive: "5m",                      // VRAM retention (default 5m)
    }),
  },
});

export const llm = registry.getPort();

.env:

LLM_PROVIDER_LOCAL=ollama|llama3.3|unlimited
LLM_TASK_ROUTE_DRAFT=local

Adapter options

interface OllamaAdapterOptions {
  baseURL?: string;                          // default "http://localhost:11434"
  autoPull?: boolean;                        // default false
  keepAlive?: string;                        // default "5m"
  validationStrategy?: ValidationStrategy;
  pricingOverrides?: Record<string, ModelPricing>;
  imageSizeLimitBytes?: number;              // default: unset (model-dependent)
  onRetry?: OnRetry;                         // alpha.17+
}

`onRetry` observability hook (alpha.17)

The adapter fires onRetry whenever it retries a generateStructured call after a Zod validation failure. Sync or async; called fire-and-forget; throwing from the hook does NOT cancel the retry. Pipe events into any tracing or metrics stack.

import { createOllamaAdapter } from "@llm-ports/adapter-ollama";

const adapter = createOllamaAdapter({
  baseURL: "http://localhost:11434",
  onRetry: (event) => {
    span.addEvent("llm.retry", {
      reason: event.reason,          // "validation-feedback" for Ollama
      attempt: event.attempt,        // 0-indexed retry number
      modelId: event.modelId,
      providerAlias: event.providerAlias,
      delayMs: event.delayMs,
    });
  },
});

Ollama only fires the validation-feedback reason. Local daemons don't have the cloud retry patterns (no 429 burst protection, no transient 401 auth, no hidden reasoning starvation), so structured-output schema misses are the only retry trigger. The event shape matches all other adapters.

Model management

const ollama = createOllamaAdapter({ autoPull: false });

// List installed models
const models = await ollama.listModels();
// [{ name: "llama3.3", size: 4_000_000_000, family: "llama", parameterSize: "8B", ... }]

// Pull a model with progress callback
await ollama.pullModel("qwen2.5:32b", (pct) => console.log(`${pct}%`));

// Delete a model
await ollama.deleteModel("old-model");

// Health check (returns ok: false if daemon unreachable)
const health = await ollama.checkHealth();
console.log(health);  // { ok: true, latencyMs: 8 }

Pricing (zero-cost defaults)

const OLLAMA_DEFAULT_PRICING = {
  inputPer1M: 0,
  outputPer1M: 0,
  embeddingPer1M: 0,
};

Every model id resolves to zero-cost. The catch-all default applies to any model id not in the explicit list, so you don't have to maintain pricing entries for every Ollama model you pull.

To track GPU time as an internal cost, override:

createOllamaAdapter({
  pricingOverrides: {
    "llama3.3:70b": { inputPer1M: 0.05, outputPer1M: 0.05 },  // synthetic "cost"
  },
});

This makes cost gating meaningful for local models. Otherwise leave the defaults; gating is correctly disabled.

Supported features

Feature	Status
`generateText`	✓
`generateStructured` (Zod schemas)	✓ (uses `format: "json"` + retry-with-feedback)
`streamText`	✓
`streamStructured`	✓
`runAgent` (multi-turn tool use)	✓ (model-dependent; needs Llama 3.3+ or Qwen 2.5+)
`generateEmbedding` / `generateEmbeddings`	✓ (nomic-embed-text, mxbai-embed-large)
Vision input — base64 images	✓ (model-dependent; LLaVA, etc.)
Vision input — URL images	✗ (Ollama doesn't fetch URLs)
Audio input	✗
Model management	✓

Content blocks supported

text, image (base64 only), tool_use, tool_result. Throws ContentBlockUnsupportedError for audio and URL-form images.

Cancellation (limited)

Entry-time abort support shipped in 0.1.0-alpha.6 — if options.signal.aborted is already true at entry, the call throws without invoking the daemon. Mid-flight cancellation is NOT supported because ollama-js v0.5 doesn't expose a per-call signal; its client.abort() method cancels ALL in-flight requests on the client, which is too coarse for per-call use. Will land when ollama-js v0.7+ exposes per-call signal. See the Cancellation guide.

Reading next

Tool-use security guide — runAgent code patterns, the destructive / requiresConfirmation / maxOutputBytes flags, the approval-gate wrapper
Content blocks reference — tool_use and tool_result block shapes
Local-to-cloud flip →
Ollama documentation

@llm-ports/adapter-ollama ​

Why this and not adapter-openai with baseURL? ​

Install ​

Configure ​

Adapter options ​

onRetry observability hook (alpha.17) ​

Model management ​

Pricing (zero-cost defaults) ​

Supported features ​

Content blocks supported ​

Cancellation (limited) ​

Reading next ​