(no title)
ohso4
|
11 months ago
Lmarena.ai is a very accurate eval (with stylecontrol). Other benchmarks like AIME and whatever can be trained on/optimized for and therefore should not be trusted. Most ai companies do something fishy to boost their benchmark scores.
No comments yet.