(no title)
lispitillo | 7 months ago
The paper seems to only study problems like sudoku solving, and not question answering or other applications of LLMs. Furthermore they omit a section for future applications or fusion with current LLMs.
I think anyone working in this field can envision their applications, but the details to have a MoE with an HRM model could be their next paper.
I only skimmed the paper and I am not an expert, sure other will/can explain why they don't discuss such a new structure. Anyway, my post is just blissful ignorance over the complexity involved and the impossible task to predict change.
Edit: A more general idea is that Mixture of Expert is related to cluster of concepts and now we would have to consider a cluster of concepts related by the time they take to be grasped, so in a sense the model would have in latent space an estimation of the depth, number of layers, and time required for each concept, just like we adapt our reading style for a dense math book different to a newspaper short story.
yorwba|7 months ago
In contrast, language modeling requires storing a large number of arbitrary phrases and their relation to each other, so I don't think you could ever get away with a similarly small model. Fortunately, a comparatively small number of steps typically seems to be enough to get decent results.
But if you tried to use an LLM-sized model in an HRM-style loop, it would be dog slow, so I don't expect anyone to try it anytime soon. Certainly not within a month.
Maybe you could have a hybrid where an LLM has a smaller HRM bolted on to solve the occasional constraint-satisfaction task.
marcosdumay|7 months ago
A person has some ~10k word vocabulary, with words fitting specific places in a really small set of rules. All combined, we probably have something on the order of a few million rules in a language.
What, yes, is larger than the thing in this paper can handle. But is nowhere near as large as a problem that should require something the size of a modern LLM to handle. So it's well worth it to try to enlarge models with other architectures, try hybrid models (note that this one is necessarily hybrid already), and explore every other possibility out there.
energy123|7 months ago
buster|7 months ago
Oras|7 months ago
Back to ML models?