top | item 43623925 (no title) ekojs | 10 months ago I think it's most illustrative to see the sample battles (H2H) that LMArena released [1]. The outputs of Meta's model is too verbose and too 'yappy' IMO. And looking at the verdicts, it's no wonder by people are discounting LMArena rankings.[1]: https://huggingface.co/spaces/lmarena-ai/Llama-4-Maverick-03... discuss order hn newest smeeth|10 months ago In fairness, 4o was like this until very recently. I suspect it comes from training on COT data from larger models. ed|10 months ago Yep, it’s clear that many wins are due to Llama 4’s lowered refusal rate which is an effective form of elo hacking.
smeeth|10 months ago In fairness, 4o was like this until very recently. I suspect it comes from training on COT data from larger models.
ed|10 months ago Yep, it’s clear that many wins are due to Llama 4’s lowered refusal rate which is an effective form of elo hacking.
smeeth|10 months ago
ed|10 months ago