top | item 47009504

(no title)

AIorNot | 16 days ago

Its Dario's job to hype the product and he hypes the product to get the billons they need- a bit more engineering focused than Altman, but no fundamental difference.

A large language model like GPT runs in what you’d call a forward pass. You give it tokens, it pushes them through a giant neural network once, and it predicts the next token. No weights change. Just matrix multiplications and nonlinearities. So at inference time, it does not “learn” in the training sense

we need some kind of new architecture to get to next gen wow stuff e.g differentiable memory systems. ie instead of modifying weights, the model writes to a structured memory that is itself part of the computation graph. More dynamic or modular architectures not bigger scalling and spending all our money on data centers

anybody in the ML community have an answer for this? (besides better RL and RHLF and World Models)

discuss

menaerus|13 days ago

> So at inference time, it does not “learn” in the training sense

It learns because it remembers the context. The larger the context, the better the capabilities of the model are. I mean just give it a try and see for yourself - start building a feature, then next feature, then the next one etc. Do it in the same "workspace" or "session" and after few days, one or two weeks of writing code with the agent, you will notice that it somehow magically remembers the stuff and builds upon that context. It becomes slower too.

"Re-learning" is something different and it may not be even needed.

senordevnyc|15 days ago

They talk about this at length in the interview

kubb|15 days ago

Yes, 100% this. But it won’t get funded. Everything will get eaten up by the impressive LLM dead end.