top | item 42479422

(no title)

tymonPartyLate | 1 year ago

Isn’t this like a brute force approach? Given it costs $ 3000 per task, thats like 600 GPU hours (h100 at Azure) In that amount of time the model can generate millions of chains of thoughts and then spend hours reviewing them or even testing them out one by one. Kind of like trying until something sticks and that happens to solve 80% of ARC. I feel like reasoning works differently in my brain. ;)

discuss

order

tikkun|1 year ago

They're only allowed 2-3 guesses per problem. So even though yes it generates many candidates, it can't validate them - it doesn't have tool use or a verifier, it submits the best 2-3 guesses. https://www.lesswrong.com/posts/Rdwui3wHxCeKb7feK/getting-50...

TrapLord_Rhodo|1 year ago

Chain of thought can entirely self validate. The OP is saying the LLM is acting like a photon, evaluate all possible solutions and choosing the most "Right" path. not quoting the OP here but my initial thought is that is does seem quite wasteful.

the LLM only gets two guesses at the "end solutions". The whole chain of thought is breaking out the context, and levels of abstraction. How many "Guesses" is it self generating and internally validating, well that's all just based on compute power and time.

My counter point to OP here would be is that is exactly how our brain works. In every given scenario, we are also evaluating all possible solutions. Our entire stack is constantly listening and eithier staying silent, or contributing to an action potential (eithier excitatory, or inhibitory). but our brain is always "Evaluating all potential possibilities" at any given moment. We have a society of mind always contributing their opinion, but the ones who don't have as much support essentially get "Shouted down".

nmca|1 year ago

It is allowed exactly two guesses, per the ARC rules.

macrolime|1 year ago

The trick with AlphaGo was brute force combined with learning to extract strategies from brute force using reinforcement learning, that's what we'll see here. So maybe it costs a million dollars in compute to get a high score, but use reinforcement learning ala alphazero to learn from the process and it won't cost a million dollars next time and let it do lots of hard benchmarks, math problems and coding tasks and it'll keep getting better and better.

nextworddev|1 year ago

The best interpretation of this result is probably that it showed tackling some arbitrary benchmark is something you can throw money at, aka it’s just something money can solve.

Its not agi obviously in the sense that you still need to some problem framing and initialization to kickstart the reasoning path simulations

torginus|1 year ago

this might be quite an important point - if they created an algorithm that can mimic human reasoning, but scales terribly with problem complexity (in terms of big O notation), it's still a very significant result, but it's not a 'humans brains are over' moment quite yet.

strangescript|1 year ago

"We have created artificial super intelligence, it has solved physics!"

"Well, yeah, but its kind of expensive" -- this guy

tymonPartyLate|1 year ago

Haha. Hopefully you’re right and solving the ARC puzzle translates to solving all of physics. I just remain skeptical about the OpenAI hype. They have a track record of exaggerating the significance of their releases and their impact on humanity.

jeremyjh|1 year ago

Please do show me a novel result in physics from any LLM. You think "this guy" is stupid because he doesn't extrapolate from this $2MM test that nearly reproduces the work of a STEM graduate to a super intelligence that has already solved physics. Maybe you've got it backwards.

freehorse|1 year ago

The problem is not that it is expensive, but that, most likely, it is not superintelligence. Superintelligence is not exploring the problem space semi-blindly, if the thounsands $$$ per task are actually spent for that. There is a reason the actual ARC-AGI prize requires efficiency, because the point is not "passing the test" but solving the framing problem of intelligence.