Skip to content

ai-feedback-middlewareComposable performance feedback for AI agents

Plug-and-play, event-sourced middleware for capturing, interpreting, and operationalizing feedback to improve LLM and agent performance. Multi-axis polarity, locked action vocabulary, lifecycle worker for silent-feedback coverage, ports-and-adapters all the way down.

60 seconds

1. Install:

bash
pnpm add @ai-feedback-middleware/core @ai-feedback-middleware/in-memory

2. Compose at startup:

ts
import {
  createFeedback,
  DEFAULT_ACTIONS,
  rejectByDefault,
} from "@ai-feedback-middleware/core";
import {
  createInMemoryEventStore,
  createInMemoryProjectionStore,
  createInMemoryTrackedArtifactsStore,
} from "@ai-feedback-middleware/in-memory";

export const feedback = createFeedback({
  eventStore: createInMemoryEventStore(),
  projectionStore: createInMemoryProjectionStore(),
  trackedArtifacts: createInMemoryTrackedArtifactsStore(),
  actions: DEFAULT_ACTIONS,
  artifactTypes: [rejectByDefault("draft_email")],
});

3. Capture an artifact when the agent produces something reviewable:

ts
const { artifact_id } = await feedback.captureArtifact({
  artifact_type: "draft_email",
  artifact_version: 1,
  producer: "secretary-agent",
  task_type: "outbound:warm_intro",
  payload: { recipient: "alice@example.com", body: "..." },
  expires_at: new Date(Date.now() + 24 * 60 * 60 * 1000).toISOString(),
});

4. Record a reaction when the user acts:

ts
await feedback.recordReaction({
  artifact_id,
  action: "approved",
  payload: { actor_id: "babak" },
});

That's it. The framework writes the capture event, the reaction event with per-axis evaluation embedded, optionally runs Layer 4 inference rules in the same transaction, transitions the lifecycle row to its terminal state, and publishes per-axis topics so downstream subscribers (prompt tuning, labeling, alerting) can pick up what they care about.

How it relates to other tools

ToolHow ai-feedback-middleware relates
Argilla, Label StudioAnnotation queues for humans. Argilla's "Records pending annotation" map to our tracked_artifacts rows in waiting state. Use them as a queue UI on top of this framework.
Humanloop, LangSmith, LangfuseLLM observability + offline eval. Their "Score" / "Feedback" rows map to our captured_evaluated_reactions. Subscribe to feedback.reaction.* and pipe events to whichever tool fits your stack.
Helicone, PromptLayerTelemetry. Different concern — we trust and ignore them at the feedback boundary.
Temporal, InngestWorkflow orchestration. Different concern — use them around the framework, not inside. The Lifecycle Worker is narrow: deadline + policy → action emission.

The framework is adjacent, not competitive. Typical adoption: this framework captures and evaluates; downstream subscribers ship events to your existing tools.

What you get

  • @ai-feedback-middleware/core — interfaces, classifier, ports, 13-action registry, schema upcasters
  • @ai-feedback-middleware/in-memory — single-process testing adapters
  • @ai-feedback-middleware/postgres — production adapters with 8 SQL migrations
  • @ai-feedback-middleware/redis-pubsub — at-most-once pub/sub bus
  • @ai-feedback-middleware/streams — RxJS wrapper for composable subscribers
  • @ai-feedback-middleware/reference — reference projections + capture adapters
  • @ai-feedback-middleware/adapter-conformance — write your own adapters with the same contract

Status

v0.3.0-alpha.0 — schema v2.1 lands. CI green on Node 20 + 22 against real Postgres 16 and Redis 7 via Docker service containers. 325 tests across 7 packages plus 12 runnable example apps. Not yet on npm (scope registration pending); install from the repo until then.

Read the full positioning →

Apache 2.0 License