Yes, you're repeating once again the same concept. We know it. What I am saying is that since the state encodes a horizon that goes beyond the mere generation of the next token (for the "past", it encodes the meaning of the conversation so far; for the "future", has already an idea of what it wants to say), this state is only changing slightly at each new inference pass, despite being each time recreated from the context. So during a sequence of (completely independent) token predictions there is an internal state that stays mostly the same, evolving only gradually in a feedback loop with the tokens that are generated at each inference cycle.Maybe it's not clear what I mean by "state". I mean a pattern of activations in the deep layers of the network that encodes for some high level semantic. Not something that is persisted. Something that doesn't need to be persisted precisely because is fully determined by the context, and the context stays roughly the same.
No comments yet.