(no title)
Brystephor | 8 months ago
> you’d have to either memorize the entire answer before speaking or come up with a simple pattern you could do while reciting that takes significantly less brainpower
This part i dont understand. Why would coming up with an algorithm (e.g. a simple pattern) and reciting it be impossible? The paper doesnt mention the models coming up with the algorithm at all AFAIK. If the model was able to come up with the pattern required to solve the puzzles and then also execute (e.g. recite) the pattern, then that'd show understanding. However the models didn't. So if the model can answer the same question for small inputs, but not for big inputs, then doesnt that imply the model is not finding a pattern for solving the answer but is more likely pulling from memory? Like, if the model could tell you fibbonaci numbers when n=5 but not when n=10, that'd imply the numbers are memorized and the pattern for generation of numbers is not understood.
qarl|8 months ago
And that's because they specifically hamstrung their tests so that the LLMs were not "allowed" to generate algorithms.
If you simply type "Give me the solution for Towers of Hanoi for 12 disks" into chatGPT it will happily give you the answer. It will write program to solve it, and then run that program to produce the answer.
But according to the skeptical community - that is "cheating" because it's using tools. Nevermind that it is the most effective way to solve the problem.
https://chatgpt.com/share/6845f0f2-ea14-800d-9f30-115a3b644e...
zoul|8 months ago
Too|8 months ago
jsnell|8 months ago
When this research has been reproduced, the "failures" on the Tower of Hanoi are the model printing out a bunch of steps, saying there is no point in doing it thousands of times more. And they they'd either output an the algorithm for printing the rest in words or code
godelski|8 months ago