(no title)
rdedev | 4 months ago
This is same as the critiques of the LLM paper by apple where they showed that LLMs fail to solve the tower of hanoi problem after a set number of towers. The test was to see how well these models can reason out a long task. People online were like they could solve that problem if they had access to a coding enviornment. Again the test was to check reasoning capability not if it knew how to code and algorithm to solve the problem.
If model performance degrade a lot after a number of reasoning steps it's good to know where the limits are. Wheather the model had access to tools or not is orthogonal to this problem
No comments yet.