I think the next big innovation in LLMs (sort of like the attention mechanism) will be some method of distributing work to much smaller, specialized, and capable units, rather than having one giant network.
We already see hints of this with MoE, but something entirely new wouldn't surprise me.
No comments yet.