top | item 42707241

(no title)

Great research here. Contextual real-time weight modification is definitely one of the breakthroughs required for AGI. Why create a LoRA when you can generate one on the fly suited to the task at hand?

discuss

verdverm|1 year ago

It does not seem like they are doing inference time weight changes, to the tune of running backprop. It sounds more like they are applying a pre-trained vector to the model, and select that vector based on the input, in a two step process

wildermuthn|1 year ago

That’s my general understanding as well, but it isn’t a large conceptual leap to go from real-time selection of pretrained “z-vectors” to real-time generation of the same. The larger conceptual breakthrough, with demonstration of its effectiveness, is the big success here.

mtts|1 year ago

Sort of. According to the text they can use multiple z-vectors (sets of weights that select for parts of the system to be used to answer a specific question) simultaneously, using a "simple optimization algorithm" to determine the relative weight for each of these vectors.

bugglebeetle|1 year ago

See also the work being done by GoodFire AI:

https://www.goodfire.ai/

They now have an API that allows for dynamic exploration and manipulation of the latent space for LLama 8-70B models (think Golden Gate Claude). They also open sourced the sparse auto-encoders that (in part) allow for this:

https://huggingface.co/Goodfire/Llama-3.3-70B-Instruct-SAE-l...

logicchains|1 year ago

>Contextual real-time weight modification is definitely one of the breakthroughs required for AGI.

It's already been invented: https://arxiv.org/abs/2202.05780 . That design is just very inefficient to scale up / use as a transformer backbone.

mnky9800n|1 year ago

Why not, as each new task comes up, and then weights are revalued, save those weights and keep them for reference as priors for similar future tasks? As the model is exposed to new data the average of the set of priors of things the model thinks is similar might move closer to the posterior making the model quicker and more able to arrive at good outcomes. I suppose storage might be an issue.

magospietato|1 year ago

I'm wondering if you could fine tune the model on an aggregate of a temporal slice of revalued weights? Something analogous to REM sleep's involvement in embedding the days events into long term memory.

QuadmasterXLII|1 year ago

thats just mixture of experts