ai-feedback-middlewareComposable performance feedback for AI agents
Plug-and-play, event-sourced middleware for capturing, interpreting,
and operationalizing feedback to improve LLM and agent performance.
Multi-axis polarity, locked action vocabulary, lifecycle worker for
silent-feedback coverage, ports-and-adapters all the way down.
Every reaction scored independently on four axes — detection, content,
timing, channel. No more collapsing "user disliked this" into one bit
and losing the why. The four axes map directly to four remediation
paths (trigger rules, prompts, scheduling, delivery routing).
Every governed artifact has a required expiresAt. The Lifecycle Worker
fires silently_accepted or silently_rejected_expired on deadline per
the artifact type's policy. No more "we never knew if the user agreed."
🧬
Inline Layer 4 actionability inference
Threshold rules over recent reactions promote per-axis polarity from
continue_to_observe to actionable_positive or actionable_negative.
Runs in the same transaction as the triggering reaction. Tombstones
filter direction-symmetrically.
🔌
Provider-agnostic adapters
Postgres for production, in-memory for tests, Redis pub/sub bus, RxJS
streams wrapper. Adapter conformance suite is the contract — write
your own and it just works. SQLite shipping next.
🛡️
Event-sourced + immutable
captured_artifacts and captured_evaluated_reactions are append-only.
Tombstones (cancelled / corrected / superseded_by) reference earlier
events; they never mutate. Full audit, perfect replay.
That's it. The framework writes the capture event, the reaction event with per-axis evaluation embedded, optionally runs Layer 4 inference rules in the same transaction, transitions the lifecycle row to its terminal state, and publishes per-axis topics so downstream subscribers (prompt tuning, labeling, alerting) can pick up what they care about.
Annotation queues for humans. Argilla's "Records pending annotation" map to our tracked_artifacts rows in waiting state. Use them as a queue UI on top of this framework.
LLM observability + offline eval. Their "Score" / "Feedback" rows map to our captured_evaluated_reactions. Subscribe to feedback.reaction.* and pipe events to whichever tool fits your stack.
Workflow orchestration. Different concern — use them around the framework, not inside. The Lifecycle Worker is narrow: deadline + policy → action emission.
The framework is adjacent, not competitive. Typical adoption: this framework captures and evaluates; downstream subscribers ship events to your existing tools.
v0.3.0-alpha.0 — schema v2.1 lands. CI green on Node 20 + 22 against real Postgres 16 and Redis 7 via Docker service containers. 325 tests across 7 packages plus 12 runnable example apps. Not yet on npm (scope registration pending); install from the repo until then.