top | item 46928238

(no title)

gre | 24 days ago

Tried all ten with claude, then had codex take a loook at the work -- codex thinks number 7 has the lowest chance of being correct, a 1 out of 10 rating. None of them were higher than 7/10 chance of being right so far as done by claude opus 4.6 and evaluated by codex 5.3 highest.

Not going to spend too many more tokens on this.

discuss

pickleRick243|23 days ago

I don't think either of these are the best choices for this. Chatgpt 5.2 pro and gemini 3 pro deep thinking I believe are the strongest LLMs at "pure thought", i.e. things like mathematical reasoning.

The_Gray|22 days ago

Any chance you're willing to share the links/outputs?