(no title)
mikehollinger | 1 year ago
I would add “in their current form” and agree. There’s three things that can change here: 1. Moore’s law: The worldwide economy is built around the steady progression of cheaper compute. Give it 36 months and your problem becomes a $25,000 problem. 2. Quantization and smaller models: There’ll likely become specializations of the various models (is this the beginning of the “Monolith vs Microservices” debate? 3. E2E Training isn’t for everyone: Finetunes and Alignment are more important than an end to end training run, IF we can coerce the behaviors we want into the models by finetuning them. That along with quantized models (imho) unlocked vision models which are now in the “plateau of producivity” in the gartner hype cycle compared to a few years ago.
So as an example today, I can grab a backbone and pretrained weights for an object detector, and with relatively little data (from a few lines to a few 10’s of lines of code, and 50 to 500 images) and relatively little wall clock time and energy (say 5 to 15 minutes) on a PC, I can create a customized object detector that can detect -my- specific objects pretty well. I might need to revise it a few times, but it’ll work pretty well.
Why would we not see the same sort of progression with transformer architectures? It hinges on someone creating the model weights for the “greater good,” or us figuring out how to do distributed training for open source in a “seti@home” style (long live the blockchain, anyone?).
jsheard|1 year ago