top | item 43379233

(no title)

ohso4 | 11 months ago

Lmarena.ai is a very accurate eval (with stylecontrol). Other benchmarks like AIME and whatever can be trained on/optimized for and therefore should not be trusted. Most ai companies do something fishy to boost their benchmark scores.

discuss

order

No comments yet.