top | item 40863966 (no title) statusfailed | 1 year ago I'd love to know what your use case is that makes those things important to you - and what kind of benchmarks and cleaning tasks do you need to run?Also, what kind of evaluations for quality of reasoning do you use? discuss order hn newest No comments yet.
No comments yet.