top | item 47086797

(no title)

I think the thing that makes 8b sized models interesting is the ability to train unique custom domain knowledge intelligence and this is the opposite of that. Like if you could deploy any 8b sized model on it and be this fast that would be super interesting, but being stuck with llama3 8b isn't that interesting.

discuss

ACCount37|10 days ago

The "small model with unique custom domain knowledge" approach has a very low capability ceiling.

Model intelligence is, in many ways, a function of model size. A small model tuned for a given domain is still crippled by being small.

Some things don't benefit from general intelligence much. Sometimes a dumb narrow specialist really is all you need for your tasks. But building that small specialized model isn't easy or cheap.

Engineering isn't free, models tend to grow obsolete as the price/capability frontier advances, and AI specialists are less of a commodity than AI inference is. I'm inclined to bet against approaches like this on a principle.

matu3ba|10 days ago

> Engineering isn't free, models tend to grow obsolete as the price/capability frontier advances, and AI specialists are less of a commodity than AI inference is. I'm inclined to bet against approaches like this on a principle.

This does not sound like it will simplify the training and data side, unless their or subsequent models can somehow be efficiently utilized for that. However, this development may lead to (open source) hardware and distributed system compilation, EDA tooling, bus system design, etc getting more deserved attention and funding. In turn, new hardware may lead to more training and data competition instead of the current NVIDIA model training monopoly market. So I think you're correct for ~5 years.

mips_avatar|10 days ago

A fine tuned 1.7B model probably is still too crippled to do anything useful. But around 8b the capabilities really start to change. I’m also extremely unemployed right now so I can provide the engineering.