top | item 40863966

(no title)

statusfailed | 1 year ago

I'd love to know what your use case is that makes those things important to you - and what kind of benchmarks and cleaning tasks do you need to run?

Also, what kind of evaluations for quality of reasoning do you use?

discuss

order

No comments yet.