top | item 45862864 (no title) Kostchei | 3 months ago We have 20+ services in prod that use llms. So I have 50k (or more) per service per day of data to evaluate. The question is- do people actually evaluate properly.And how do you do an apples to apples evaluation of such squishy services? discuss order hn newest No comments yet.
No comments yet.