top | item 46573376 (no title) dangelosaurus | 1 month ago I work on Promptfoo (an open-source eval framework). Appreciate the mention here. This post captures a lot of the hard lessons around agent evals. In particular, task ambiguity and brittle graders are things we run into constantly. discuss order hn newest No comments yet.
No comments yet.