top | item 47036743

(no title)

jader201 | 13 days ago

That’s not the problem with this post.

The problem is that most LLM models answer it correctly (see the many other comments in this thread reporting this). OP cherry picked the few that answered it incorrectly, not mentioning any that got it right, implying that 100% of them got it wrong.

discuss

order

thinkling|13 days ago

You can see up-thread that the same model will produce different answers for different people or even from run to run.

That seems problematic for a very basic question.

Yes, models can be harnessed with structures that run queries 100x and take the "best" answer, and we can claim that if the best answer gets it right, models therefore "can solve" the problem. But for practical end-user AI use, high error rates are a problem and greatly undermine confidence.

rluna828|12 days ago

The magic of LLMs is that one llm can learn everything and then we can clone it. However, if we don't know ahead of time which one will be the best one, then we should probably keep a lot of version with real (mathematically calculated) diversity. Ironically, the DEI peeps were right all along.

serial_dev|13 days ago

My understanding is that it mainly fails when you try it in speech mode, because it is the fastest model usually. I tried yesterday all major providers and they were all correct when I typed my question.

raincole|13 days ago

Nay-sayers will tell you all OpenAI, Google and Anthropic 'monkeypatched' their models (somehow!) after reading this thread and that's why they answer it correctly now.

You can even see those in this very thread. Some commenters even believe that they add internal prompts for this specific question (as if people are not attempting to fish ChatGPT's internal prompts 24/7. As if there aren't open weight models that answer this correctly.)

You can't never win.