(no title)
getoffit | 13 days ago
This was realized in 2023 already: https://newsletter.semianalysis.com/p/google-we-have-no-moat...
"Less is best" is not a new realization. The concept exists across contexts. Music described as "overplayed". Prose described as verbose.
We just went through an era of compute that chanted "break down your monoliths". NPM ecosystem being lots of small little packages to compose together. Unix philosophy of small composable utilities is another example.
So models will improve as they are compressed, skeletonized down to opcodes, geometric models to render, including geometry for text as the bytecode patterns for such will provide the simplest model for recreating the most outputs. Compressing out useless semantics from the state of the machines operations and leaving the user to apply labels at the presentation layer.
nguyentran03|13 days ago
The "no moat" memo you linked was about open source catching up to closed models through fine-tuning, not about small models outperforming large ones.
I'm also not sure what "skeletonized down to opcodes" or "geometry for text as bytecode patterns" means in the context of neural networks. Model compression is a real field (quantization, distillation, pruning) but none of it works the way you're describing here.
BoredomIsFun|13 days ago
Your whole comment feels like, pardon me, like LARPing. No, small models do not outperform the large ones, unless finetuned. Saying that as someone who uses small models 95% vs cloud ones.
unknown|13 days ago
[deleted]