top | item 46005300

(no title)

VHRanger | 3 months ago

All of this will depend on the settings on the model (reasoning effort, temperature, top_k,etc) as well.

Which is why you should have benchmarks that are a bit broader generally (>10 questions for a personal setup) otherwise you overfit to noise

discuss

No comments yet.