top | item 46733783

(no title)

raffisk | 1 month ago

Introed Determinism-Faithfulness assurance harness (DFAH) in new paper "Replayable Financial Agents" along with the open-source code

A few findings: - Determinism and faithfulness are positively correlated (r = 0.45) for the tasks in my experiments - Schema-first Tier 1 (7–20B) stays near the 95% compliance threshold under stress. - Frontier models performed well on some tasks (e.g., strong action determinism in agentic triage), but the matrix helps define when HITL is still needed.

note: I didn't have control of inferencing engines, or infra for these experiments, leveraged local models/frontier APIs

Paper: https://arxiv.org/abs/2601.15322

discuss

order

No comments yet.