top | item 44801120

(no title)

podnami | 6 months ago

Wow this was actually blazing fast. I prompted "how can the 45th and 47th presidents of america share the same parents?"

On ChatGPT.com o3 thought for for 13 seconds, on OpenRouter GPT OSS 120B thought for 0.7 seconds - and they both had the correct answer.

discuss

swores|6 months ago

I'm not sure that's a particularly good question for concluding something positive about the "thought for 0.7 seconds" - it's such a simple answer, ChatGPT 4o (with no thinking time) immediately answered correctly. The only surprising thing in your test is that o3 wasted 13 seconds thinking about it.

Workaccount2|6 months ago

A current major outstanding problem with thinking models is how to get them to think an appropriate amount.

nisegami|6 months ago

Interesting choice of prompt. None of the local models I have in ollama (consumer mid range gpu) were able to get it right.

golergka|6 months ago

When I pay attention to o3 CoT, I notice it spends a few passes thinking about my system prompt. Hard to imagine this question is hard enough to spend 13 seconds on.

Imustaskforhelp|6 months ago

Not gonna lie but I got sorta goosebumps

I am not kidding but such progress from a technological point of view is just fascinating!

xpe|6 months ago

How many people are discussing this after one person did 1 prompt with 1 data point for each model and wrote a comment?

What is being measured here? For end-to-end time, one model is:

t_total = t_network + t_queue + t_batch_wait + t_inference + t_service_overhead