top | item 42490004

(no title)

One fundamental challenge to me is that if each training run because more and more expensive, the time it takes it to learn what works/doesn't work widens. Half a billion dollars for training a model is already nuts, but if it takes 100 iterations to perfect it, you've cumulatively spent 50 billion dollars... Smaller models may actually be where rapid innovation continues simply because of tighter feedback loops. O3 may be an example of this.

discuss

ciconia|1 year ago

When you think about it it's astounding how much energy this technology consumes versus a human brain which runs at ~20W [1].

[1] https://hypertextbook.com/facts/2001/JacquelineLing.shtml

anon373839|1 year ago

It’s almost as if human intelligence doesn’t involve performing repeated matrix multiplications over a mathematically transformed copy of the internet. ;-)

concerndc1tizen|1 year ago

20w for 20 years to answer questions slowly and error-prone at the level of a 30B model. An additional 10 years with highly trained supervision and the brain might start contributing original work.

dominicrose|1 year ago

A human brain is also more intelligent (hopefully) and is inside a body. In a way GPT resembles Google more than it resembles us.

soulofmischief|1 year ago

You've discovered the importance of well-formed priors. The human brain is the result of millions of years of very expensive evolution.

soheil|1 year ago

A human brain has been in continuous training for hundreds of thousands of years consuming slightly more than 20 watts.

dkobia|1 year ago

AGI is the Sisyphean task of our age. We’ll push this boulder up the mountain because we have to, even if it kills us.

missedthecue|1 year ago

Do we know LLMs are the path to AGI? If they're not, we'll just end up with some neat but eye wateringly expensive LLMs.

wruza|1 year ago

Says who? And more importantly, is this the boulder? All I (and many others here) see is that people engage others to sponsor pushing some boulder, screaming promises which aren’t even that consistent with intermediate results that come out. This particular boulder may be on a wrong mountain, and likely is.

It all feels like doubling down on astrology because good telescopes aren’t there yet. I’m pretty sure that when 5 comes out, it will show some amazing benchmarks but shit itself in the third paragraph as usual in a real task. Cause that was constant throughtout gpt evolution, in my experience.

even if it kills us

Full-on sci-fi, in reality it will get stuck around a shell error message and either run out of money to exist or corrupt the system into no connectivity.

h0l0cube|1 year ago

There's no doubt been progress on the way to AGI, but ultimately it's still a search problem, and one that will rely on human ingenuity at least until we solve it. LLMs are such a vast improvement in showing intelligent-like behavior that we've become tantalized by it. So now we're possibly focusing our search in the wrong place for the next innovation on the path to AGI. Otherwise, it's just a lack of compute, and then we just have to wait for the capacity to catch up.

namaria|1 year ago

A task that is completed and kills us is pretty much the opposite of a Sisyphean task.

soheil|1 year ago

Really the killing part was not necessary to make your point and thus injecting your Sisyphean prose.

Any technology may kill us, but we'll keep innovating as we ought to. What's your next point?

goatlover|1 year ago

Why do we have to?

idiotsecant|1 year ago

And when we get it there, it kills us.

madeofpalk|1 year ago

What has AGI got to do with this?

jayseattle|1 year ago

[deleted]

khana|1 year ago

[deleted]

ulfw|1 year ago

Why? Nobody asked us if we want this. Nobody has a plan what to do with humanity when there is AGI

bloodyplonker22|1 year ago

I am working at an AI company that is not OpenAI. We have found ways to modularize training so we can test on narrower sets before training is "completely done". That said, I am sure there are plenty of ways others are innovating to solve the long training time problem.

gerdesj|1 year ago

Perhaps the real issue is that learning takes time and that there may not be a shortcut. I'll grant you that argument's analogue was complete wank when comparing say the horse and cart to a modern car.

However, we are not comparing cars to horses but computers to a human.

I do want "AI" to work. I am not a luddite. The current efforts that I've tried are not very good. On the surface they offer a lot but very quickly the lustre comes off very quickly.

(1) How often do you find yourself arguing with someone about a "fact"? Your fact may be fiction for someone else.

(2) LLMs cannot reason

A next token guesser does not think. I wish you all the best. Rome was not burned down within a day!

I can sit down with you and discuss ideas about what constitutes truth and cobblers (rubbish/false). I have indicated via parenthesis (brackets in en_GB) another way to describe something and you will probably get that but I doubt that your programme will.

icpmacdo|1 year ago

This is literally just the scaling laws, "Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets. This provides an efficient way for practitioners and researchers alike to compare pretraining decisions involving optimizers, datasets, and model architectures"

https://arxiv.org/html/2410.11840v1#:~:text=Scaling%20laws%2....

merizian|1 year ago

Because of mup [0] and scaling laws, you can test ideas empirically on smaller models, with some confidence they will transfer to the larger model.

[0] https://arxiv.org/abs/2203.03466

fny|1 year ago

O3 is not a smaller model. It's an iterative GPT of sorts with the magic dust of reinforcement learning.

falcor84|1 year ago

I'm pretty sure that the parent implied that o3 is smaller in comparison to gpt5

unknown|1 year ago

[deleted]

cma|1 year ago

>the time it takes it to learn what works/doesn't work widens.

From the raw scaling laws we already knew that a new base model may peter out in this run or the next with some amount of uncertainty--"the intersection point is sensitive to the precise power-law parameters":

https://gwern.net/doc/ai/nn/transformer/gpt/2020-kaplan-figu...

Later graph gpt-3 got to here:

https://gwern.net/doc/ai/nn/transformer/gpt/2020-brown-figur...

https://gwern.net/scaling-hypothesis

dyauspitr|1 year ago

Until you get to a point where the LLM is smart enough to look at real world data streams and prune its own training set out of it. At that point it will self improve itself to AGI.

soheil|1 year ago

It's like saying bacteria reproduction is way faster than humans so that's where we should be looking for the next breakthroughs.

ramesh31|1 year ago

But if the scaling law holds true, more dollars should at some point translate into AGI, which is priceless. We haven't reached the limits yet of that hypothesis.

unshavedyak|1 year ago

> which is priceless

This also isn't true. It'll clearly have a price to run. Even if it's very intelligent, if the price to run it is too high it'll just be a 24/7 intelligent person that few can afford to talk to. No?

threeseed|1 year ago

a) There is evidence e.g. private data deals that we are starting to hit the limitations of what data is available.

b) There is no evidence that LLMs are the roadmap to AGI.

c) Continued investment hinges on their being a large enough cohort of startups that can leverage LLMs to generate outsized returns. There is no evidence yet this is the case.