top | item 47156399

(no title)

throw310822 | 4 days ago

Yes, as I said the prompt (the entire history of the conversation, including vendor prompting that the user can't see) entirely determines the internal state according to the LLM's weights. But the fact that at each new token the prediction starts from scratch doesn't mean that the new internal state is very different from the previous one. A state that represents the general meaning of the conversation and where the sentence is going will not be influenced much by a new token appended to the end. So the internal state "persists" and transitions smoothly even if it is destroyed and recreated from scratch at each prediction.

discuss

order

giantrobot|4 days ago

The state "persists" as the context. There's no more than the current context. If you dumped the context to disk, zeroed out all VRAM, then reloaded the LLM, and then fed that context back in you'd have the same state as if you'd never reloaded anything.

Nothing is persisted in the LLM itself (weights, layer, etc) nor in the hardware (modulo token caching or other scaling mechanisms). In fact this happens all the time with the big inference providers. Two sessions of a chat will rarely (if ever) execute on the same hardware.

throw310822|4 days ago

Yes, you're repeating once again the same concept. We know it. What I am saying is that since the state encodes a horizon that goes beyond the mere generation of the next token (for the "past", it encodes the meaning of the conversation so far; for the "future", has already an idea of what it wants to say), this state is only changing slightly at each new inference pass, despite being each time recreated from the context. So during a sequence of (completely independent) token predictions there is an internal state that stays mostly the same, evolving only gradually in a feedback loop with the tokens that are generated at each inference cycle.

Maybe it's not clear what I mean by "state". I mean a pattern of activations in the deep layers of the network that encodes for some high level semantic. Not something that is persisted. Something that doesn't need to be persisted precisely because is fully determined by the context, and the context stays roughly the same.