(no title)
startages | 9 days ago
There are 4 models, all receiving the exact same prompts a few times a day, required to respond with a specific action.
In the first experiment I used gemini-3-pro-preview, it spent ~$18 on the same task where Opus 4.5 spent ~$4, GPT-5.1 spent ~$4.50, and Grok spent ~$7. Pro was burning through money so fast I switched to gemini-3-flash-preview, and it's still outspending every other model on identical prompts. The new experiment is showing the same pattern.
Most of the cost appears to be reasoning tokens.
The takeaway here is: Gemini spends significantly more on reasoning tokens to produce lower quality answers, while Opus thinks less and delivers better results. The per-token price being lower doesn't matter much when the model needs 4x the tokens to get there.
camel_Snake|9 days ago
Opus: 521k input tokens; 12k out
Grok: 443k input tokens; 57k out
Gemini: 677k input tokens; 7k out
OAI: 543k input tokens; 17k out
Gemini appears to use by far the least amount of reasoning tokens, assuming they're included in the output counts.
tourist2d|9 days ago
[deleted]