(no title)
0xdeafcafe | 8 months ago
We've been working on a way to test this more systematically by simulating full conversations with agents and surfacing the exact point where things go off the rails. Kind of like unit tests, but for context, behavior, and other ai jank.
Full disclosure, I work at the company building this, but the core library is open source, free to use, etc. https://github.com/langwatch/scenario
No comments yet.