top | item 40987356

(no title)

nirga | 1 year ago

We trained our own models for some of them, and we combined some well known NLP metrics (like Gruen [1]) to make this work.

You're right that it's hard to figure out how to "trust" these metrics. But you shouldn't look at them as a way to get an objective number about your app's performance. They're more of a way to detect deltas - regressions or changes in performance. When you get more alerts, or more negative results (or less alerts / less negative results) - you can tell you're improving. And this works for tools like RAGAS as well as our own metrics in my view.

[1] https://www.traceloop.com/blog/gruens-outstanding-performanc...

discuss

No comments yet.