I'm not sure that's a particularly good question for concluding something positive about the "thought for 0.7 seconds" - it's such a simple answer, ChatGPT 4o (with no thinking time) immediately answered correctly. The only surprising thing in your test is that o3 wasted 13 seconds thinking about it.
When I pay attention to o3 CoT, I notice it spends a few passes thinking about my system prompt. Hard to imagine this question is hard enough to spend 13 seconds on.
Asking it about a marginally more complex tech topic and getting an excellent answer in ~4 seconds, reasoning for 1.1 seconds...
I am _very_ curious to see what GPT-5 turns out to be, because unless they're running on custom silicon / accelerators, even if it's very smart, it seems hard to justify not using these open models on Groq/Cerebras for a _huge_ fraction of use-cases.
Non-rhetorically, why would someone pay for o3 api now that I can get this open model from openai served for cheaper? Interesting dynamic... will they drop o3 pricing next week (which is 10-20x the cost[1])?
Not even that, even if o3 being marginally better is important for your task (let's say) why would anyone use o4-mini? It seems almost 10x the price and same performance (maybe even less): https://openrouter.ai/openai/o4-mini
Wow, that's significantly cheaper than o4-mini which seems to be on part with gpt-oss-120b. ($1.10/M input tokens, $4.40/M output tokens) Almost 10x the price.
LLMs are getting cheaper much faster than I anticipated. I'm curious if it's still the hype cycle and Groq/Fireworks/Cerebras are taking a loss here, or whether things are actually getting cheaper. At this we'll be able to run Qwen3-32B level models in phones/embedded soon.
I really want to try coding with this at 2600 tokens/s (from Cerebras). Imagine generating thousands of lines of code as fast as you can prompt. If it doesn't work who cares, generate another thousand and try again! And at $.69/M tokens it would only cost $6.50 an hour.
I tried this (gpt-oss-120b with Cerebras) with Roo Code. It repeatedly failed to use the tools correctly, and then I got 429 too many requests. So much for the "as fast as I can think" idea!
I'll have to try again later but it was a bit underwhelming.
The latency also seemed pretty high, not sure why. I think with the latency the throughout ends up not making much difference.
Btw Groq has the 20b model at 4000 TPS but I haven't tried that one.
podnami|6 months ago
On ChatGPT.com o3 thought for for 13 seconds, on OpenRouter GPT OSS 120B thought for 0.7 seconds - and they both had the correct answer.
swores|6 months ago
nisegami|6 months ago
golergka|6 months ago
Imustaskforhelp|6 months ago
I am not kidding but such progress from a technological point of view is just fascinating!
xpe|6 months ago
What is being measured here? For end-to-end time, one model is:
t_total = t_network + t_queue + t_batch_wait + t_inference + t_service_overhead
tekacs|6 months ago
https://x.com/tekacs/status/1952788922666205615
Asking it about a marginally more complex tech topic and getting an excellent answer in ~4 seconds, reasoning for 1.1 seconds...
I am _very_ curious to see what GPT-5 turns out to be, because unless they're running on custom silicon / accelerators, even if it's very smart, it seems hard to justify not using these open models on Groq/Cerebras for a _huge_ fraction of use-cases.
tekacs|6 months ago
tekacs|6 months ago
https://news.ycombinator.com/item?id=44738004
... today, this is a real-time video of the OSS thinking models by OpenAI on Groq and I'd have to slow it down to be able to read it. Wild.
sigmar|6 months ago
[1] currently $3M in/ $8M out https://platform.openai.com/docs/pricing
gnulinux|6 months ago
gnulinux|6 months ago
LLMs are getting cheaper much faster than I anticipated. I'm curious if it's still the hype cycle and Groq/Fireworks/Cerebras are taking a loss here, or whether things are actually getting cheaper. At this we'll be able to run Qwen3-32B level models in phones/embedded soon.
tempaccount420|6 months ago
mikepurvis|6 months ago
spott|6 months ago
bangaladore|6 months ago
modeless|6 months ago
andai|6 months ago
I'll have to try again later but it was a bit underwhelming.
The latency also seemed pretty high, not sure why. I think with the latency the throughout ends up not making much difference.
Btw Groq has the 20b model at 4000 TPS but I haven't tried that one.