top | item 43300565

(no title)

Wonderfall | 11 months ago

I do actually read a lot about LLM interpretability, and this is my own conclusion (should've phrased it like "they don't seem to have"). I do actually consider this an open question, so I'm a bit confused as to why you think this way - perhaps due to my phrasing (I just had a very long flight), but know that is not the case and I always doubt things. In fact I said after the text you quoted that the question is actually quite open (mentioning reasoning models, but to be honest, it's not exclusive to them, just more apparent in some ways).

I might also clarify (here and probably in my article when I have the time to do so). LLMs "do" build internal models in the sense that, at the same time:

- They organize knowledge by domain in a unified network

- They're capable of generalization (already mentioned and acknowledged at the very beginning of the article)

However these models, while they share parallels with human cognition, lack substance and can't replicate (yet) the deep integrated cognitive model of humans. That is where current interpretability research is at, and probably SOTA LLMs too. My own opinion and speculation is that autoregressive models will never get to a satisfying approximation level of the human-level cognition since humans' thinking process seems to be more than autoregressive components, aligning with current psychology. But that doesn't mean architectures won't evolve.

Do not misunderstand that because I said they're pattern matching machines, that they will be unable to properly "think". In fact, the line between pattern matching and thinking is actually quite blurry.

discuss

No comments yet.