top | item 44828706

(no title)

fdsjgfklsfd | 6 months ago

I think they're just reaching the limits of this architecture and when a new type is invented it will be a much bigger step.

discuss

order

hodgehog11|6 months ago

Working in the theory, I can say this is incredibly unlikely. At scale, once appropriately trained, all architectures begin to converge in performance.

It's not architectures that matter anymore, it's unlocking new objectives and modalities that open another axis to scale on.

viraptor|6 months ago

Do we really have the data on this? I mean, it does happen on a smaller scale, but where's the 300B version of RWKV? Where's hybrid symbolic/LLM? Where are other experiments? I only see larger companies doing relatively small tweaks to the standard transformers, where the context size still explodes the memory use - they're not even addressing that part.

fdsjgfklsfd|6 months ago

Do you mean "all variants of the same stacked transformer architecture converge in performance"? Or do you know of tests against some other architecture? The diffusion-based LLMs?

highfrequency|6 months ago

Could you elaborate with a few more paragraphs? What do you mean by “working in the theory?”