top | item 43816496

(no title)

foundry27 | 10 months ago

I just tried the same puzzle in o3 using the same image input, but tweaked the prompt to say “don’t use the search tool”. Very similar results!

It spent the first few minutes analyzing the image and cross-checking various slices of the image to make sure it understood the problem. Then it spent the next 6-7 minutes trying to work through various angles to the problem analytically. It decided this was likely a mate-in-two (part of the training data?), but went down the path that the key to solving the problem would be to convert the position to something more easily solvable first. At that point it started trying to pip install all sorts of chess-related packages, and when it couldn’t get that to work it started writing a simple chess solver in Python by hand (which didn’t work either). At one point it thought the script had found a mate-in-six that turned out to be due to a script bug, but I found it impressive that it didn’t just trust the script’s output - instead it analyzed the proposed solution and determined the nature of the bug in the script that caused it. Then it gave up and tried analyzing a bit more for five more minutes, at which point the thinking got cut off and displayed an internal error.

15 minutes total, didn’t solve the problem, but fascinating! There were several points where if the model were more “intelligent”, I absolutely could see it reasoning it out following the same steps.

discuss

bko|10 months ago

Claude gets the right answer but misplaces the pieces in its initial analysis which means the answer is incorrect.

Whats going on? Did it just get lucky? Did it memorize the answer but misplace the pieces in its recall? Did it actually compute anything?

https://claude.ai/share/d640bc4c-8dd8-4eaa-b10b-cb3f83a6b94b

This is the board as it sees it (incorrect):

https://lichess.org/editor/kb6/pp6/2P5/8/8/3K4/8/R7_w_-_-_0_...

IanCal|10 months ago

Told that it was a mate in 2 puzzle, and it solved it for me

https://chatgpt.com/share/680f4a02-4cc4-8002-8301-59214fca78...

It worked through some stuff then decided to try and list all possible moves as there can't be that many. Tried importing stuff that didn't work, then wrote code to create the permutations.