top | item 47156136

(no title)

With LLM's their internal state is their training + system prompt + context. Most chatbot UIs hide the context management. But if you take an existing conversation and replace a term in the context with another grammatically (and semantically) similar term then send that the LLM will adjust its output to that new "history".

It will have no conception or memory of the alternate line of discussion with the previous term. It only "knows" what is contained in the current combination of training + system prompt + context.

If you change the LLM's personal from "Sam" to "Alex" in the LLM's conception of the world it's always been "Alex". It will have no memory of ever being "Sam".

discuss

throw310822|4 days ago

Yes, as I said the prompt (the entire history of the conversation, including vendor prompting that the user can't see) entirely determines the internal state according to the LLM's weights. But the fact that at each new token the prediction starts from scratch doesn't mean that the new internal state is very different from the previous one. A state that represents the general meaning of the conversation and where the sentence is going will not be influenced much by a new token appended to the end. So the internal state "persists" and transitions smoothly even if it is destroyed and recreated from scratch at each prediction.

giantrobot|4 days ago

The state "persists" as the context. There's no more than the current context. If you dumped the context to disk, zeroed out all VRAM, then reloaded the LLM, and then fed that context back in you'd have the same state as if you'd never reloaded anything.

Nothing is persisted in the LLM itself (weights, layer, etc) nor in the hardware (modulo token caching or other scaling mechanisms). In fact this happens all the time with the big inference providers. Two sessions of a chat will rarely (if ever) execute on the same hardware.