I think the thing that makes 8b sized models interesting is the ability to train unique custom domain knowledge intelligence and this is the opposite of that. Like if you could deploy any 8b sized model on it and be this fast that would be super interesting, but being stuck with llama3 8b isn't that interesting.
ACCount37|10 days ago
Model intelligence is, in many ways, a function of model size. A small model tuned for a given domain is still crippled by being small.
Some things don't benefit from general intelligence much. Sometimes a dumb narrow specialist really is all you need for your tasks. But building that small specialized model isn't easy or cheap.
Engineering isn't free, models tend to grow obsolete as the price/capability frontier advances, and AI specialists are less of a commodity than AI inference is. I'm inclined to bet against approaches like this on a principle.
matu3ba|10 days ago
This does not sound like it will simplify the training and data side, unless their or subsequent models can somehow be efficiently utilized for that. However, this development may lead to (open source) hardware and distributed system compilation, EDA tooling, bus system design, etc getting more deserved attention and funding. In turn, new hardware may lead to more training and data competition instead of the current NVIDIA model training monopoly market. So I think you're correct for ~5 years.
mips_avatar|10 days ago