top | item 46065284

(no title)

Say we discover a new architecture breakthrough like Yann LeCun's proposed JEPA. Won't scaling laws apply to it anyway?

Suppose training is so efficient that you can train state of the art AGI on a few GPUs. If it's better than current LLMs, there will be more demand/inference, which will require more GPUs and we are back at the same "add more gpus".

I find it hard to believe that we, as a humanity, will hit the wall of "we don't need more compute", no matter what the algorithms are.

discuss

godelski|3 months ago

  > Won't scaling laws apply to it anyway?

Yes, of course. Scaling Laws will always apply, but that's not really the point[0]

The fight was never "Scale is all you need" (SIAYN) vs "scale is irrelevant" it was "SIAYN" vs "Scaling is not enough". I'm not aware of any halfway serious researcher that did not think scaling was going to result in massive improvements. Being a researcher from the SINE camp myself...

Here's the thing:

The SIAYN camp argued that the transformer architecture was essentially good enough. They didn't think scale was all you needed, but that the rest would me minor tweaks and increasing model size and data size would get us there. That those were the major hurdles. In this sense they argued that we should move our efforts away from research and into engineering. That AGI was now essentially a money problem rather than a research problem. They pointed to Sutton's Bitter Lesson narrowly, concentrating on his point about compute.

The SINE (or SINAYN) camp wasn't sold. We read the Bitter Lesson differently. That yes, compute is a key element to modern success, but just as important was the rise of our flexible algorithms. In the past we couldn't work with such algorithms because of lack of computational power, but that the real power was the algorithms. We're definitely a more diverse camp too, with vary arguments. Many of us look at animals and see that we can do so much more with so much less[2]. Clearly even if SIAYN were sufficient, it does not appear to be efficient. Regardless, we all agree that there's still subtle nuances in intelligence that need working out.

The characteristics of the scaling "laws" matter but it isn't enough. In the end what matters is generalization. For that we don't really have measures. Unfortunately, with the SIAYN camp also came benchmark maximization. It was a good strategy in the beginning as it helped give us direction. But we are now at the hard problem with the SINE camp predicted. How do you do things like make a model a good music generator when you have no definition of "good music"? Even in a very narrow sense we don't have a half way decent mathematical definition of any aesthetics. We argued "we should be trying to figure this out so we don't hit a wall" and they argued "it'll emerge with scale".

So now the cards have been dealt. Who has the winning hand? More importantly, which camp will we fund? And will we fund the SIAYN people that converted to SINE or will we fund those who have been SINE when times were tough?

[0] They've been power laws and I expect them to continue to be power laws[1]. But the parameters of those laws do still matter, right?

[1] https://www.youtube.com/watch?v=HBluLfX2F_k

[2] A mouse has on the order of 100M neurons (and 10^12 synapses). Not to mention how little power they operate on! These guys can still our perform LLMs on certain tasks despite the LLMs having like 4 orders of magnitude more parameters and many more in data!

andreybaskov|3 months ago

I agree scaling alone is not enough, and transformers itself is a proof of that - it was an iteration on the attention mechanism and a few other changes.

But no matter what the next big thing is, I'm sure it would immediately fill all available compute to maximize its potential. It's not like intelligence has a ceiling beyond which you don't need more intelligence.

tim333|3 months ago

Was "scale is all you need" actually a real thing said by a real person? Even the most pro scale people like Altman seem to be saying research and algorithms are a thing too. I guess as you say a more important thing is where the money goes. I think Altman's been overdoing it a bit on scaling spend.