top | item 47157992

(no title)

The state "persists" as the context. There's no more than the current context. If you dumped the context to disk, zeroed out all VRAM, then reloaded the LLM, and then fed that context back in you'd have the same state as if you'd never reloaded anything.

Nothing is persisted in the LLM itself (weights, layer, etc) nor in the hardware (modulo token caching or other scaling mechanisms). In fact this happens all the time with the big inference providers. Two sessions of a chat will rarely (if ever) execute on the same hardware.

discuss

throw310822|4 days ago

Yes, you're repeating once again the same concept. We know it. What I am saying is that since the state encodes a horizon that goes beyond the mere generation of the next token (for the "past", it encodes the meaning of the conversation so far; for the "future", has already an idea of what it wants to say), this state is only changing slightly at each new inference pass, despite being each time recreated from the context. So during a sequence of (completely independent) token predictions there is an internal state that stays mostly the same, evolving only gradually in a feedback loop with the tokens that are generated at each inference cycle.

Maybe it's not clear what I mean by "state". I mean a pattern of activations in the deep layers of the network that encodes for some high level semantic. Not something that is persisted. Something that doesn't need to be persisted precisely because is fully determined by the context, and the context stays roughly the same.