Skip to content

Why this exists

Most TypeScript LLM code imports provider SDKs directly. generateText() calls scatter across dozens of files. Every SDK upgrade breaks multiple files. Every provider switch is a refactor. Cost control becomes per-call tracking glue you have to write yourself.

llm-ports fixes this. Two files import the LLM SDK; everything else talks to a provider-agnostic interface that supports multi-provider routing, USD cost gating, fallback chains, and reusable capability factories.

This is the library that assumes you're running LLMs in production at cost, not in a demo.

Position in the ecosystem

ToolWhat it doesWhere llm-ports differs
Vercel AI SDKUnifies provider calls behind one TS APIllm-ports adds registry, fallback chains, USD cost gating, validation recovery, capability factories on top. Use @llm-ports/adapter-vercel to keep your existing setup.
LiteLLMPython-first HTTP proxy that fronts every providerllm-ports is TypeScript client-side — zero network hop, zero extra service to deploy. Talks to LiteLLM via the OpenAI adapter with a baseURL.
PortkeyCommercial hosted gateway with analytics UIllm-ports is MIT, in-process, no vendor dependency. Tradeoff: Portkey ships features llm-ports does not (hosted UI, semantic caching).
LangChain.jsFull agent / chain frameworkllm-ports is a utility, not a framework. Wrap LangChain's LLM calls with a port for budget gating without adopting the whole framework.
LlamaIndex.TSRetrieval-first frameworkllm-ports handles LLM invocation; bring your own retrieval. They compose cleanly.
MastraOpinionated agent-first with built-in memory and workflowsllm-ports is unopinionated primitives beneath that layer.

The positioning in one line: llm-ports is the smallest opinionated TypeScript library for LLMs in production, built around cost control, fallback chains, validation recovery, and tool-use security as primitives.

Nothing above that, nothing below it.

Production track record

llm-ports is extracted from BEPA, a private 24/7 AI executive assistant the author has been running in production for 5+ months across 4 LLM providers (Anthropic, OpenAI, Cerebras, DeepInfra), processing millions of LLM calls.

The extracted core has 5 months of production runtime:

  • Single Hetzner server, Docker, 24/7 uptime
  • 4 LLM providers, automatic fallback chains
  • One Vercel AI SDK upgrade (v4 to v6) handled in an afternoon: 28 of 30 LLM-calling files were untouched
  • Migration commit stats: 192 insertions, 688 deletions (codebase shrunk by 496 lines)
  • Latency overhead added by the framework: mean p50 0.04 ms, max p99 0.47 ms (10x under the 5 ms target)

v0.1 also extends that pattern with features the 2026 ecosystem requires even though BEPA hasn't adopted them yet: multimodal content blocks, USD-denominated cost gating, split EmbeddingsPort, streamStructured. BEPA will absorb these back over time.

The track record above is for the extracted core, not the extensions. Credibility cuts both ways: overclaiming "all of v0.1 is production-tested" is wrong; underclaiming undersells the part that genuinely is. The above is the precise truth — the BEPA-extracted core has the runtime; the 2026-only additions (multimodal, USD gating, split EmbeddingsPort, streamStructured) ship in 0.1.0-alpha.* as the test bake before they earn the same claim.

For the per-surface picture — what's stable, what's still being hardened, what ships in v0.2 — see the v0.1 status page. It's the canonical inventory.

When NOT to use this

  • You have 1-2 LLM call sites and a single provider — the abstraction overhead isn't worth it.
  • You're prototyping and not committed to an architecture.
  • You need a provider-specific feature that doesn't generalize (e.g. Anthropic's prompt caching surface — though you can wire that into the adapter).
  • You're building a library that wraps an LLM SDK — you ARE the adapter; just use LLMPort directly.

MIT License