(no title)
nexttk | 9 months ago
The fact that they predict next token is just the "interface" i.e. an LLM has the interface "predictNextToken(String prefix)". It doesn't say how it is implemented. One implementation could be a human brain. Another could be a simple lookup table that looks at the last word and then selects the next from that. Or anything in between. The point is that 'next-token-prediction' does not say anything about implementation and so does not reduce the capabilities even though it is often invoked like that. Just because it is only required to emit the next token (or rather, a probability distribution thereof) it is permitted to think far ahead, and indeed has to if it is to make a good prediction of just the next token. As interpretability research (and common sense) shows, LLM's have a fairly good idea what they are going to say in the many, many next tokens ahead in order that it can make a good prediction for the next immediate tokens. That's why you can have nice, coherent, well-structured, long responses from LLM's. And have probably never seen it get stuck in a dead end where it can't generate a meaningful continuation.
If you are to reason about LLM capabilities never think in terms of "stochastic parrot", "it's just a next token predictor" because it contains exactly zero useful information and will just confuse you.
lsy|9 months ago
But the thrust of the critique of next-token prediction or stochastic output is that there isn't "intelligence" because the output is based purely on syntactic relations between words, not on conceptualizing via a world model built through experience, and then using language as an abstraction to describe the world. To the computer there is nothing outside tokens and their interrelations, but for people language is just a tool with which to describe the world with which we expect "intelligences" to cope. Which is what this article is examining.
famouswaffles|9 months ago
LLMs model concepts internally and this has been demonstrated empirically many times over the years, including recently by anthropic (again). Of course, that won't stop people from repeating it ad nauseum.
yahoozoo|9 months ago
Planning and long-range coherence emerge from training on text written by humans who think ahead, not from intrinsic model capabilities. This distinction matters when evaluating whether an LLM is actually reasoning or simply simulating the surface structure of reasoning.
famouswaffles|8 months ago
That's not true.
https://www.anthropic.com/research/tracing-thoughts-language...