top | item 45588553

(no title)

drpixie | 4 months ago

>> a solution that seems correct under their heuristic reasoning, but they arrived at that result in a non-logical way

Not quite ... LLMs are not HAL (unfortunately). They produce something that is associated with the same input, something that should look like an acceptable answer. A correct answer will be acceptable, and so will any answer that has been associated with similar input. And so will anything that fools some of the people, some of the time ;)

The unpredictability is a huge problem. Take the geoguess example - it has come up with a collection of "facts" about Paramaribo. These may or may-not be correct. But some are not shown in the image. Very likely the "answer" is derived from completely different factors, and the "explanation" in spurious (perhaps an explanation of how other people made a similar guess!)

The questioner has no way of telling if the "explanation" was actually the logic used. (It wasn't!) And when genuine experts follow the trail of token activation, the answer and the explanation are quite independent.

discuss

Yizahi|4 months ago

> Very likely the "answer" is derived from completely different factors, and the "explanation" in spurious (perhaps an explanation of how other people made a similar guess!)

This is very important and often overlooked idea. And it is 100% correct, even admitted by Anthropic themselves. When user asks LLM to explain how it arrived to a particular answer, it produces steps which are completely unrelated to the actual mechanism inside LLM programming. It will be yet another generated output, based on the training data.

jmogly|4 months ago

Effortless lying, scary in humans, scarier in machines?