They used to compare to competing models from Anthropic, Google DeepMind, DeepSeek, etc. Seems that now they only compare to their own models. Does this mean that the GPT-series is performing worse than its competitors (given the "code red" at OpenAI)?
Tiberium|2 months ago
https://i.imgur.com/e0iB8KC.png
enlyth|2 months ago
sergdigon|2 months ago
whimsicalism|2 months ago
tabletcorry|2 months ago
But they publish all the same numbers, so you can make the full comparison yourself, if you want to.
Workaccount2|2 months ago
Apple only compares to themselves. They don't even acknowledge the existence of others.
poormathskills|2 months ago
boole1854|2 months ago
I see evaluations compared with Claude, Gemini, and Llama there on the GPT 4o post.