top | item 45862864

(no title)

Kostchei | 3 months ago

We have 20+ services in prod that use llms. So I have 50k (or more) per service per day of data to evaluate. The question is- do people actually evaluate properly.

And how do you do an apples to apples evaluation of such squishy services?

discuss

order

No comments yet.