top | item 46905274

(no title)

breuleux | 25 days ago

The point is that "predicting the next token" is such a general mechanism as to be meaningless. We say that LLMs are "just" predicting the next token, as if this somehow explained all there was to them. It doesn't, not any more than "the brain is made out of atoms" explains the brain, or "it's a list of lists" explains a Lisp program. It's a platitude.

discuss

order

esafak|24 days ago

It's not meaningless, it's a prediction task, and prediction is commonly held to be closely related if not synonymous with intelligence.

breuleux|24 days ago

In the case of LLMs, "prediction" is overselling it somewhat. They are token sequence generators. Calling these sequences "predictions" vaguely corresponds to our own intent with respect to training these machines, because we use the value of the next token as a signal to either reinforce or get away from the current behavior. But there's nothing intrinsic in the inference math that says they are predictors, and we typically run inference with a high enough temperature that we don't actually generate the max likelihood tokens anyway.

The whole terminology around these things is hopelessly confused.