(no title)
AIorNot | 16 days ago
A large language model like GPT runs in what you’d call a forward pass. You give it tokens, it pushes them through a giant neural network once, and it predicts the next token. No weights change. Just matrix multiplications and nonlinearities. So at inference time, it does not “learn” in the training sense
we need some kind of new architecture to get to next gen wow stuff e.g differentiable memory systems. ie instead of modifying weights, the model writes to a structured memory that is itself part of the computation graph. More dynamic or modular architectures not bigger scalling and spending all our money on data centers
anybody in the ML community have an answer for this? (besides better RL and RHLF and World Models)
menaerus|13 days ago
It learns because it remembers the context. The larger the context, the better the capabilities of the model are. I mean just give it a try and see for yourself - start building a feature, then next feature, then the next one etc. Do it in the same "workspace" or "session" and after few days, one or two weeks of writing code with the agent, you will notice that it somehow magically remembers the stuff and builds upon that context. It becomes slower too.
"Re-learning" is something different and it may not be even needed.
senordevnyc|15 days ago
kubb|15 days ago