@llm-ports/adapter-openai
Direct adapter for the OpenAI SDK. Implements both LLMPort and EmbeddingsPort. The baseURL option means the same adapter serves OpenAI plus 10+ OpenAI-compatible providers.
Install
pnpm add @llm-ports/core @llm-ports/adapter-openai openai zodConfigure (OpenAI default)
import { createRegistryFromEnv } from "@llm-ports/core";
import { createOpenAIAdapter } from "@llm-ports/adapter-openai";
const registry = createRegistryFromEnv({
adapters: {
openai: createOpenAIAdapter({
apiKey: process.env.OPENAI_API_KEY!,
}),
},
});
export const llm = registry.getPort();Configure (compat providers via baseURL)
| Provider | baseURL | Notes |
|---|---|---|
| OpenAI | (none) | Default |
| Azure OpenAI | https://<resource>.openai.azure.com/openai/deployments/<deployment> | Needs api-version header |
| Groq | https://api.groq.com/openai/v1 | Fast inference |
| Together AI | https://api.together.xyz/v1 | Open models |
| Fireworks AI | https://api.fireworks.ai/inference/v1 | Open models |
| DeepInfra | https://api.deepinfra.com/v1/openai | Open models |
| Perplexity | https://api.perplexity.ai | Online models with citations |
| Cerebras | https://api.cerebras.ai/v1 | Fast inference |
| Clarifai | https://api.clarifai.com/v2/ext/openai/v1 | Personal Access Token (PAT) as apiKey; hosts Qwen3.6 + others |
| SambaNova | https://api.sambanova.ai/v1 | Bearer token as apiKey; hosts MiniMax-M2.7 + others |
| LiteLLM proxy | self-hosted, e.g. http://localhost:4000 | Self-hosted proxy |
| Ollama (compat mode) | http://localhost:11434/v1 | Prefer adapter-ollama for native API + management |
Each compatible provider has its own pricing — supply via pricingOverrides:
import { createOpenAIAdapter } from "@llm-ports/adapter-openai";
createOpenAIAdapter({
apiKey: process.env.GROQ_API_KEY!,
baseURL: "https://api.groq.com/openai/v1",
pricingOverrides: {
"llama-3.3-70b-versatile": { inputPer1M: 0.59, outputPer1M: 0.79 },
},
});Worked example: Clarifai (Qwen3.6 35B A3B FP8)
Clarifai exposes an OpenAI-compatible surface at /v2/ext/openai/v1. Authenticate with a Personal Access Token (PAT), pass the model ID exactly as published by Clarifai (Qwen3_6-35B-A3B-FP8), and the adapter handles the rest. Qwen3.6 is a reasoning model and ships in KNOWN_REASONING_MODELS, so the first call already uses the reasoning-headroom multiplier — no wasted round-trip.
import { createOpenAIAdapter } from "@llm-ports/adapter-openai";
const clarifai = createOpenAIAdapter({
apiKey: process.env.CLARIFAI_PAT!,
baseURL: "https://api.clarifai.com/v2/ext/openai/v1",
displayName: "clarifai",
pricingOverrides: {
"Qwen3_6-35B-A3B-FP8": {
inputPer1M: 0.76,
outputPer1M: 0.43,
// Blended ~$0.72/1M; 262k context window.
// reasoningModel: true is auto-seeded via KNOWN_REASONING_MODELS;
// setting it here would override the catalog if you ever need to.
},
},
});Pricing note: Clarifai's Qwen3.6 FP8 has output pricing lower than input ($0.43 vs $0.76 per 1M). That's not a typo. The FP8 quantization makes output token generation cheaper than the prefill stage; most providers price the other way, so verify with Clarifai's pricing page before locking it in.
Worked example: SambaNova (MiniMax M2.7)
SambaNova exposes an OpenAI-compatible surface at https://api.sambanova.ai/v1. Pass your SambaNova bearer token as apiKey, use the published model ID (MiniMax-M2.7). MiniMax-M2.7 is also pre-seeded as a reasoning model.
const sambanova = createOpenAIAdapter({
apiKey: process.env.SAMBANOVA_API_KEY!,
baseURL: "https://api.sambanova.ai/v1",
displayName: "sambanova",
pricingOverrides: {
"MiniMax-M2.7": {
inputPer1M: 0.60,
outputPer1M: 2.40,
// Blended ~$0.78/1M; 197k context window.
},
},
});Reasoning models need budget. Both Qwen3.6 and MiniMax-M2.7 burn tokens on hidden reasoning before producing visible output. Always supply
maxOutputTokens(8k+ recommended) so the auto-retry headroom multiplier has a number to expand. Calls withoutmaxOutputTokensskip the safety net.
Cost shape: At blended $0.72/1M (Clarifai Qwen3.6) and $0.78/1M (SambaNova MiniMax-M2.7), these are comparable to Cerebras GptOSS 120B ($0.65 in / $0.85 out per 1M) and substantially cheaper than Claude Sonnet 4.5 ($3 in / $15 out). The 4:1 output:input premium on MiniMax-M2.7 means reasoning-heavy workloads (long internal chain-of-thought) will skew higher than the blended number suggests — budget on output tokens, not the blend.
Adapter options
interface OpenAIAdapterOptions {
apiKey: string;
baseURL?: string;
fetch?: typeof fetch;
validationStrategy?: ValidationStrategy;
pricingOverrides?: Record<string, ModelPricing>;
displayName?: string; // for error messages when pointed at a non-OpenAI baseURL
}Bundled pricing
| Model | Input/1M | Output/1M | Cached input |
|---|---|---|---|
gpt-5 | $2.50 | $10.00 | $0.25 |
gpt-5-mini | $0.15 | $0.60 | $0.075 |
gpt-5-nano | $0.05 | $0.20 | $0.025 |
gpt-4o | $2.50 | $10.00 | $1.25 |
gpt-4o-mini | $0.15 | $0.60 | $0.075 |
o3 | $15.00 | $60.00 | $7.50 |
o3-mini | $1.10 | $4.40 | $0.55 |
text-embedding-3-small | n/a | n/a | $0.02 (per 1M input tokens) |
text-embedding-3-large | n/a | n/a | $0.13 |
Source: openai.com/pricing. Verified 2026-04-10.
Supported features
| Feature | Status |
|---|---|
generateText | ✓ |
generateStructured (Zod schemas) | ✓ (native response_format: json_object + retry-with-feedback) |
streamText | ✓ |
streamStructured | ✓ |
runAgent (multi-turn tool use) | ✓ |
generateEmbedding / generateEmbeddings | ✓ |
| Vision input — base64 images | ✓ (data URI) |
| Vision input — URL images | ✓ |
| Audio input — base64 wav, mp3 | ✓ |
| Audio input — base64 ogg | ✗ (OpenAI doesn't support ogg) |
| Audio input — URL audio | ✗ (OpenAI requires base64) |
| Prompt caching | partial (cached_tokens reported in usage) |
Content blocks supported
text, image (base64 → data URI; URL passthrough), audio (base64 wav/mp3 only), tool_use, tool_result. The adapter throws ContentBlockUnsupportedError for unsupported variants.
Image cost-vs-fidelity: the detail hint
OpenAI's vision pipeline accepts a detail hint per image: "auto" (default), "low", or "high".
detail | Token cost | Use case |
|---|---|---|
"low" | ~85 tokens regardless of image size | Triage, broad classification, "is this a screenshot of X?" |
"high" | ~170 tokens per 512×512 tile (so a 1024×1024 image is ~765 tokens) | OCR, small-text reading, fine-grained reasoning |
"auto" (default) | OpenAI picks based on image size | Sensible default for mixed workloads |
The field lives on ImageSource and is forwarded to image_url.detail when set:
const result = await llm.generateText({
taskType: "screenshot_triage",
prompt: [
{ type: "text", text: "Is this a login form or a settings page?" },
{
type: "image",
source: {
kind: "base64",
mediaType: "image/png",
data: screenshotBase64,
detail: "low", // 85 tokens vs ~765 for the default — 9x cheaper for triage
},
},
],
});Other adapters ignore the field — Anthropic and Ollama don't have an equivalent knob.
Reasoning models (auto-handled)
Reasoning models — OpenAI's o3, o3-mini, gpt-5-nano, plus compat-provider reasoning models like Cerebras gpt-oss-120b — burn tokens on internal chain-of-thought before producing visible output. A naive call with maxOutputTokens: 20 against gpt-5-nano reliably returns empty text and finish_reason=length because the budget got consumed by reasoning.
The OpenAI adapter handles this automatically, with no configuration:
- Detection. The adapter inspects each response for two reasoning signals:
usage.completion_tokens_details.reasoning_tokens > 0(OpenAI o-series, gpt-5-nano shape) or a populatedmessage.reasoningstring field (Cerebras gpt-oss shape). Either signal marks the model as a reasoning model in a process-wide cache. - Auto-retry on starvation. If a response shows the starvation signature (
text === ""+finish_reason === "length"+ reasoning signal), the adapter retries the call once withmax_completion_tokensmultiplied by a headroom factor (default 10×). The retry typically succeeds with visible output. - Subsequent calls skip discovery. Once a model is marked reasoning in the cache, every later call to that model uses the multiplier up front — no wasted first-attempt round-trip.
The default headroom multiplier (10×) is calibrated against o-series reasoning intensity. You can override per-model via pricingOverrides[modelId].capabilities.reasoningHeadroomMultiplier.
First-call cost. The first call to an unknown reasoning model in a given process pays one wasted round-trip (the starved attempt) before the cache learns the constraint. The adapter ships a
KNOWN_REASONING_MODELSstatic catalog that pre-seeds the cache for well-known reasoning lineups so the wasted round-trip is skipped. Models the catalog already knows about (as of0.1.0-alpha.4):
- OpenAI o-series (
o1*,o3*,o4*)- OpenAI
gpt-5-nano*- Cerebras
gpt-oss-*(viabaseURL=https://api.cerebras.ai/v1)- Clarifai
Qwen3_6-*(viabaseURL=https://api.clarifai.com/v2/ext/openai/v1)- SambaNova
MiniMax-M2.7(viabaseURL=https://api.sambanova.ai/v1)For other reasoning models the adapter doesn't know yet, runtime learning still catches the constraint on first call. To skip even that one wasted round-trip, set
pricingOverrides[modelId].capabilities.reasoningModel = true. Tracked at TD-LLMP-03.
The adapter also handles two other transient OpenAI quirks transparently:
- Capability rejection. Some models reject custom
temperature,response_format: { type: "json_object" }, or a separatesystemmessage. The adapter catches theunsupported_valueerror, learns the constraint, retries with the offending parameter dropped, and remembers it for the rest of the process. - Project-key burst protection (sk-proj- keys).* New OpenAI project keys briefly return 401 "Incorrect API key" under burst protection — even when the key is valid. The adapter retries with exponential backoff (default 500ms / 1500ms / 4500ms), but only if a prior request on the same client succeeded (so a real bad key doesn't get masked). Configurable via the
transientAuthRetriesandtransientAuthBackoffMsoptions.
All three retry kinds (plus validation-feedback retries inside generateStructured) fire the onRetry hook shipped in 0.1.0-alpha.1 — pass an OnRetry callback at adapter construction time to observe them. See examples/with-onretry/ for a worked example wiring the hook to a console logger and a metrics sink.
Reading next
- Tool-use security guide —
runAgentcode patterns, the destructive / requiresConfirmation / maxOutputBytes flags, the approval-gate wrapper - Content blocks reference —
tool_useandtool_resultblock shapes - Multi-provider routing — wire multiple compat providers as separate aliases
- OpenAI pricing — verify bundled table
Compat-provider test coverage. Compat providers (Cerebras, Groq, Together AI, Fireworks AI, DeepInfra, Perplexity, Azure OpenAI, LiteLLM proxy) are exercised today by basic
generateTextlive tests. Structured-output, streaming, agent, and embeddings coverage for compat providers is one-test-deep — e.g. a regression in Cerebras'smessage.reasoningparsing wouldn't be caught by the existing live suite. Tracked at TD-LLMP-02; full compat-provider matrix coverage ships with v0.2.