top | item 47058305

(no title)

awestroke | 13 days ago

Tried this with Claude models, ChatGPT models and Gemini models. Haiku and Sonnet failed almost every time, as did ChatGPT models. Gemini succeeded with reasoning, but used Google Maps tool calls without reasoning (lol). 50% success rate still.

The only model that consistently answers it correctly is Opus 4.6

discuss

No comments yet.