(no title)
giantrobot | 4 days ago
Nothing is persisted in the LLM itself (weights, layer, etc) nor in the hardware (modulo token caching or other scaling mechanisms). In fact this happens all the time with the big inference providers. Two sessions of a chat will rarely (if ever) execute on the same hardware.
throw310822|4 days ago
Maybe it's not clear what I mean by "state". I mean a pattern of activations in the deep layers of the network that encodes for some high level semantic. Not something that is persisted. Something that doesn't need to be persisted precisely because is fully determined by the context, and the context stays roughly the same.