I was thinking that why are we investing so much effort on predicting next token when we can basically change our approach to predicting next version of the LLM itself i.e. a LLM which generates new LLM with weights - until it maxes out all benchmarks
No comments yet.