top | item 43623925

(no title)

ekojs | 10 months ago

I think it's most illustrative to see the sample battles (H2H) that LMArena released [1]. The outputs of Meta's model is too verbose and too 'yappy' IMO. And looking at the verdicts, it's no wonder by people are discounting LMArena rankings.

[1]: https://huggingface.co/spaces/lmarena-ai/Llama-4-Maverick-03...

discuss

smeeth|10 months ago

In fairness, 4o was like this until very recently. I suspect it comes from training on COT data from larger models.

ed|10 months ago

Yep, it’s clear that many wins are due to Llama 4’s lowered refusal rate which is an effective form of elo hacking.