Great research here. Contextual real-time weight modification is definitely one of the breakthroughs required for AGI. Why create a LoRA when you can generate one on the fly suited to the task at hand?
It does not seem like they are doing inference time weight changes, to the tune of running backprop. It sounds more like they are applying a pre-trained vector to the model, and select that vector based on the input, in a two step process
That’s my general understanding as well, but it isn’t a large conceptual leap to go from real-time selection of pretrained “z-vectors” to real-time generation of the same. The larger conceptual breakthrough, with demonstration of its effectiveness, is the big success here.
Sort of. According to the text they can use multiple z-vectors (sets of weights that select for parts of the system to be used to answer a specific question) simultaneously, using a "simple optimization algorithm" to determine the relative weight for each of these vectors.
They now have an API that allows for dynamic exploration and manipulation of the latent space for LLama 8-70B models (think Golden Gate Claude). They also open sourced the sparse auto-encoders that (in part) allow for this:
Why not, as each new task comes up, and then weights are revalued, save those weights and keep them for reference as priors for similar future tasks? As the model is exposed to new data the average of the set of priors of things the model thinks is similar might move closer to the posterior making the model quicker and more able to arrive at good outcomes. I suppose storage might be an issue.
I'm wondering if you could fine tune the model on an aggregate of a temporal slice of revalued weights? Something analogous to REM sleep's involvement in embedding the days events into long term memory.
verdverm|1 year ago
wildermuthn|1 year ago
mtts|1 year ago
bugglebeetle|1 year ago
https://www.goodfire.ai/
They now have an API that allows for dynamic exploration and manipulation of the latent space for LLama 8-70B models (think Golden Gate Claude). They also open sourced the sparse auto-encoders that (in part) allow for this:
https://huggingface.co/Goodfire/Llama-3.3-70B-Instruct-SAE-l...
logicchains|1 year ago
It's already been invented: https://arxiv.org/abs/2202.05780 . That design is just very inefficient to scale up / use as a transformer backbone.
mnky9800n|1 year ago
magospietato|1 year ago
QuadmasterXLII|1 year ago