top | item 46928238

(no title)

gre | 24 days ago

Tried all ten with claude, then had codex take a loook at the work -- codex thinks number 7 has the lowest chance of being correct, a 1 out of 10 rating. None of them were higher than 7/10 chance of being right so far as done by claude opus 4.6 and evaluated by codex 5.3 highest.

Not going to spend too many more tokens on this.

discuss

order

pickleRick243|23 days ago

I don't think either of these are the best choices for this. Chatgpt 5.2 pro and gemini 3 pro deep thinking I believe are the strongest LLMs at "pure thought", i.e. things like mathematical reasoning.

The_Gray|22 days ago

Any chance you're willing to share the links/outputs?