top | item 44833689

(no title)

degrews | 6 months ago

It's because those markets are based on the LLM Arena leaderboard (https://lmarena.ai/), where Claude has historically done poorly.

That eval has also become a lot less relevant (it's considered not very indicative of real-world performance), so it's unlikely Anthropic will prioritize optimizing for it in future models.

discuss

kmacdough|6 months ago

Anthropic has always been one of the best at not optimizing for stupid metrics. Rather, they spend significant energy researching weaknesses and building metrics around that. Google is also pretty on point IMO, but they can also afford to dedicate to these nonsense metrics as they are still good marketing.

Meanwhile Meta and Xai are behind the ball and largely marketing focused.

ttroyr|6 months ago

True. I'm surprised they are not based on e.g. OpenRouter usage or similar.