top | item 47136665

(no title)

Fundamentally the failure here is one of reasoning/planning - either of not reasoning about the implicit requirements (in this case extremely obvious - in order to wash my car at the car wash, my car needs to be at the car wash) to directly arrive at the right answer, and/or of not analyzing the consequences of any considered answer before offering it as the answer.

While this is a toy problem, chosen to trick LLMs given their pattern matching nature, it is still indicative of their real world failure modes. Try asking an LLM for advice in tackling a tough problem (e.g. bespoke software design), and you'll often get answers whose consequences have not been thought through.

In a way the failures on this problem, even notwithstanding the nature of LLMs, are a bit surprising given that this type of problem statement kinda screams out (at least to a human) that it is a logic test, but most of the LLMs still can't help themselves and just trigger off the "50m drive vs walk" aspect. It reminds a bit of the "farmer crossing the river by boat in fewest trips" type problem that used to be popular for testing LLMs, where a common failure was to generate a response that matched the pattern of ones it had seen during training (first cross with A and B, then return with X, etc), but the semantics were lacking because of failure to analyze the consequences of what it was suggesting (and/or of planning better in the first place).

discuss

No comments yet.