If you drive clock wise along the beach on an island
7 points| Cookingboy | 2 days ago
I asked this question to multiple LLM.
ChatGPT: Wrong but reasoned itself back to being correct.
Gemini: Correct.
Grok: Using expert it got the right answer after 35s.
Claude Sonnet 4.6: Confidently incorrect.
Screenshots: https://imgur.com/a/7pmcoWr
al_borland|2 days ago
This is one of those questions that could have multiple answers, or require follow up questions, depending on how pedantic the asker wants to be.
Trick question, the island was in a lake, you’re nowhere near the ocean.
Trick question, it’s a small island and the ocean is all around you, not just on the left. How big must an island before this isn’t true? Is it a line of sight question?
catcowcostume|1 day ago
muzani|23 hours ago
Correct answer with Sonnet 4.6, but this might as well be a coin flip. I've found Sonnet 4.6 to be substantially dumber than 4.5. I'd rate Sonnet 4.5 a 10/10 at creative writing and 4.6 a 3/10.
My ChatGPT just expired and I was about to get Claude instead, but I'm starting to rethink this.
CodeBit26|1 day ago
throwaway5465|1 day ago
But no one thinks like that.
After testing whis, what strikes me is how stubborn the LLMs are about being wrong. Is that a more important takeaway: that LLMs seem to back down less even when clearly wrong?