top | item 43626814

(no title)

mkolodny | 10 months ago

“Got caught” is a misleading way to present what happened.

According to the article, Meta publicly stated, right below the benchmark comparison, that the version of Llama on LMArena was the experimental chat version:

> According to Meta’s own materials, it deployed an “experimental chat version” of Maverick to LMArena that was specifically “optimized for conversationality”

The AI benchmark in question, LMArena, compares Llama 4 experimental to closed models like ChatGPT 4o latest, and Llama performs better (https://lmarena.ai/?leaderboard).

discuss

No comments yet.