Skip to content

v0.1 status

A single canonical inventory of what's stable in llm-ports v0.1, what's still being hardened, and what's deferred to v0.2. Other docs pages link here when a caveat is in play; this page is the authoritative source.

This is the page to share when someone asks "what works in alpha?" or "what should I expect to break?"


What's stable in v0.1

These are load-bearing today, with comprehensive test coverage. Not "experimental"; not "planned." If you build on these, the contract will not change without a deprecation cycle.

SurfaceCoverage
LLMPort interface (5 methods)211 offline tests + cross-adapter contract suite
EmbeddingsPort interfacecovered by OpenAI + Ollama live tests; mocked-SDK regression tests
Registry with task-route walking + selectModel budget gatingoffline registry tests in core, plus end-to-end via examples
USD cost gating (per-hour / per-day / per-month)offline + Phase 2 live verification; precision verified at 10 decimals
Anthropic adapter (full feature set: prompt caching, vision, tool use)full live + contract suites
OpenAI adapter (chat + embeddings + 10 compat providers via baseURL)full live + contract; runtime capability discovery; reasoning-model auto-handling; transient-401 burst-protection retry
Ollama adapter (chat + embeddings + model management)offline + Phase 2 live
Capability factories (createClassifier, createScorer, createDrafter, createSummarizer, createExtractor, createPlanner, createAnalyzer)offline + Phase 3 live (via Cerebras/Anthropic)
Validation strategies (throw, retry-with-feedback, fallback-to-next-provider, custom)offline tests + Phase 2 live exercise
ContentBlock[] discriminated union (text, image, audio, tool_use, tool_result)offline tests across adapters
Latency overheadmean p50 0.04 ms, max p99 0.47 ms (10× under the 5 ms target)

The Anthropic + OpenAI + Ollama adapters and the capability factories are the BEPA-extracted core, in production at BEPA for 5+ months across millions of LLM calls.


Known limitations in v0.1

These are tracked publicly. Each row links to the GitHub issue with the full reproduction, workaround, and resolution path. Filter on the known-limitation label for the live list.

Recently closed in 0.1.0-alpha.1 (2026-05-11)

These items shipped fixes in the 0.1.0-alpha.1 patch. Listed here for context — they no longer apply on @llm-ports/*@alpha.

WasClosed by
runAgent tool input schemas passed as {} to the model#1 — both adapters now wire zod-to-json-schema.
No onRetry observability hook#3OnRetry / RetryEvent exported from @llm-ports/core; threaded through all four adapter-openai retry sites plus the Vercel adapter's starvation + validation-feedback retries.
Vercel adapter starved reasoning models#4 — Vercel adapter now retries once with a 4× budget when finish=length, empty text, and tokens were consumed.
Vercel generateStructured threw SyntaxError: Unexpected end of JSON input on empty responses#5 — adapter now throws typed EmptyResponseError carrying alias + modelId.
Capability factories' default taskType values were not documented#6 — getting-started shows the LLM_TASK_ROUTE_GENERAL catch-all; the task-routing concept page documents per-capability defaults.
adapter-anthropic forwarded temperature to models that reject it (Claude 4.5+ reasoning)#12 — adapter now learns the constraint at runtime, strips temperature, retries automatically. Includes static catalog for known rejectors, onRetry plumbing (parity with adapter-openai / adapter-vercel), and click-to-file GitHub URL when new constraints are learned. See known-quirks.md.

Medium-impact (still open in v0.1)

No medium-impact items are currently open. Five medium-impact issues from 0.1.0-alpha.0 were resolved in 0.1.0-alpha.1 (see the table above). New ones will land here as users report them.

Lower-impact (real but rarely surfaced)

LimitationSurfaceNotes
Registry walks the chain on budget gating but does not retry the next provider on runtime errors (network 5xx, 429, etc.)Registry behaviorThe LLM_TASK_ROUTE_X=fast,backup chain switches when fast is over its USD/request budget. If fast returns a 5xx mid-call, the call fails — it doesn't auto-retry on backup. Catch ProviderUnavailableError in your call site for the v0.1 path.
First call to an unknown reasoning model in a fresh process pays one wasted round-tripOpenAI adapterThe adapter's per-process cache learns the constraint after the first starved attempt. To skip the discovery round-trip, set pricingOverrides[modelId].capabilities.reasoningModel = true for known reasoning models.
Compat-provider live coverage is one-test-deep (basic generateText only)OpenAI adapter via baseURL (Cerebras, Groq, Together AI, Fireworks, etc.)Structured-output / streaming / agent / embeddings are not regression-tested for compat providers in v0.1. A compat-provider regression in message.reasoning parsing wouldn't be caught by the current live suite.
Vercel adapter's runAgent is single-turn onlyVercel adapterMulti-step tool use through Vercel's own agent loop ships in v0.2. For multi-turn agents today, prefer the direct OpenAI / Anthropic / Ollama adapters.
Vercel adapter multimodal inputs pass as [image content] placeholder stringsVercel adapterImage and audio content blocks downgrade to text. Direct adapters support full multimodal.
Vercel adapter has no bundled pricing tableVercel adapterBring your own pricing map at createVercelAdapter({ pricing: { ... } }). The OpenAI / Anthropic / Ollama adapters ship pricing tables.
Some compat-provider models require a pricingOverrides entryRegistry pricing-validationCerebras gpt-oss-120b, Groq's Llama variants, etc. need an explicit pricing override before the registry will admit them.

Adapter-specific model quirks (observed 2026-05-12 in live alpha bake)

These aren't adapter bugs — they're model-behavior quirks worth knowing if you target one model in particular. The typed error surface catches them; the call site decides whether to retry, route to a fallback, or surface to the user.

ModelQuirkWhere it surfacesWorkaround
claude-haiku-4-5Occasionally omits a z.string().min(N)-constrained field entirely on first attempt. The model produces JSON missing the field rather than producing a too-short string. Retry-with-feedback sometimes recovers but not always when the prompt is generic.generateStructured with constrained string fields(a) Add explicit "ALWAYS include the <field> field" instruction in the prompt; (b) loosen the .min(N) constraint if the validator was being pedantic anyway; (c) catch ValidationError and route to a fallback model with LLM_TASK_ROUTE_X=claude-haiku,gpt-4o-mini. The typed-error surface works as designed — this is information, not failure.
gpt-4o-miniOccasionally returns extra fields not in the Zod schema. Zod ignores them by default.generateStructured against a Zod object without .strict()Add .strict() to the Zod object if you care about exact-shape, OR ignore (default Zod behavior is permissive).

These are observations, not regressions. The plumbing handles both cases predictably; only the user-facing prompt strategy needs awareness.


What v0.2 adds

Roadmap target — not promises, but the work queue. Order is approximate; what ships first is whatever has clearest user need.

SurfaceWhat ships
Vercel adapter feature parityMulti-turn runAgent through Vercel's own agent loop. (Reasoning-model handling and EmptyResponseError already landed in 0.1.0-alpha.1 — #4, #5.)
Registry runtime fallbackRetry-on-ProviderUnavailableError with chain walk. Catch-class configurable.
Compat-provider test depthStructured / streaming / agent / embeddings live tests across Cerebras, Groq, Together, Fireworks.
createAgent capability factoryHigher-level ergonomics matching createClassifier / createDrafter. Bundles wrapWithApprovalGate + tool/message plumbing into one configure-once factory. The v0.1 path (runAgent directly) keeps working.
@llm-ports/observabilityQuality tracking hooks, sinks, deterministic edit-diff helpers. The pieces of BEPA that learn from production traffic, extracted into a separate package so users opt in.
Expanded capabilitiesTargeted: redact, route, decide, answer, rerank. Prioritized by user requests in the capability-request issues.

What v0.3+ adds

Further out. Subject to change based on v0.1 + v0.2 user signal.

  • @llm-ports/adapter-google for Gemini (different API shape than OpenAI/Anthropic; handled separately)
  • @llm-ports/adapter-mistral if the Mistral API stops fitting under the OpenAI compat shape
  • A portable skill / capability format (Markdown-with-YAML-frontmatter) — being evaluated; not a commitment
  • Native streaming for runAgent (currently you can stream tool-use steps via the lower-level adapter, but not from the agent loop)

How to track new limitations

If you hit something not on this page, please open a bug report. The template captures the version + repro shape needed to triage. New known-limitation items get the known-limitation label and land on this page within a few days.

For open-ended discussion (design feedback, "is this how I should do X?", show-and-tell), GitHub Discussions is the better surface than an issue.

MIT License