A really great way to get an idea of the relative cost and performance of these models at their various thinking budgets is to look at the ARC-AGI-2 leaderboard. Opus 4.5 stacks up very well here when you compare to Gemini 3’s score and cost. Gemini 3 Deep Think is still the current leaders but at more than 30x the cost.The cost curve of achieving these scores is coming down rapidly. In Dec 2024 when OpenAI announced beating human performance on ARC-AGI-1, they spent more than $3k per task. You can get the same performance for pennies to dollars, approximately an 80x reduction in 11 months.
https://arcprize.org/leaderboard
https://arcprize.org/blog/oai-o3-pub-breakthrough
energy123|3 months ago