top | item 40426702

(no title)

> Numbers like these really don't bode well for the long-term prospects of open source models, I doubt the current strategy of waiting expectantly for a corporation to spoonfeed us yet another $100,000 model for free is going to work forever.

I would add “in their current form” and agree. There’s three things that can change here: 1. Moore’s law: The worldwide economy is built around the steady progression of cheaper compute. Give it 36 months and your problem becomes a $25,000 problem. 2. Quantization and smaller models: There’ll likely become specializations of the various models (is this the beginning of the “Monolith vs Microservices” debate? 3. E2E Training isn’t for everyone: Finetunes and Alignment are more important than an end to end training run, IF we can coerce the behaviors we want into the models by finetuning them. That along with quantized models (imho) unlocked vision models which are now in the “plateau of producivity” in the gartner hype cycle compared to a few years ago.

So as an example today, I can grab a backbone and pretrained weights for an object detector, and with relatively little data (from a few lines to a few 10’s of lines of code, and 50 to 500 images) and relatively little wall clock time and energy (say 5 to 15 minutes) on a PC, I can create a customized object detector that can detect -my- specific objects pretty well. I might need to revise it a few times, but it’ll work pretty well.

Why would we not see the same sort of progression with transformer architectures? It hinges on someone creating the model weights for the “greater good,” or us figuring out how to do distributed training for open source in a “seti@home” style (long live the blockchain, anyone?).

discuss

jsheard|1 year ago

Yeah, there's no accounting for breakthroughs in training efficiency. I wouldn't count on Moores Law though, the amount of compute you can put into these problems is effectively unbounded so more efficient silicon just means those with money can train even bigger models. 3D rendering is a decent analogy, Moores Law has made it easy to render something comparable to the first Toy Story movie, but Pixar poured those gains back into more compute and is using it to do things you definitely can't afford to.