Nice writeup! This is the second post I've seen in the genre of "I've had a secret, personal benchmark for LLMs where the 'solution' requires questioning the premises, and o4-mini-high beats it." The first post I saw was about a chessboard and the prompt "mate in one:" https://x.com/KelseyTuoc/status/1912945346126417940
(Edited to remove direct spoiler for the MU-puzzle, in case people want to try it.)
No comments yet.