top | item 41657717

(no title)

Sunhold | 1 year ago

Look at the sample chain-of-thought for o1-preview under this blog post, for decoding "oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz". At this point, I think the "fancy autocomplete" comparisons are getting a little untenable.

https://openai.com/index/learning-to-reason-with-llms/

discuss

order

ToucanLoucan|1 year ago

I’m not seeing anything convincing here. OpenAI says that it’s models are better at reasoning and asserts they are testing this by comparing how it does solving some problems between o1 and “experts” but it doesn’t show the experts or o1s responses to these questions nor does it even deign to share what the problems are. And, crucially, it doesn’t specify if writings on these subjects were part of training data.

Call me a cynic here but I just don’t find it too compelling to read about OpenAI being excited about how smart OpenAIs smart AI is in a test designed by OpenAI and run by OpenAI.

NoGravitas|1 year ago

"Any sufficiently advanced technology is indistinguishable from a rigged demo." A corollary of Clarke's Law found in fannish circles, origin unknown.

HarHarVeryFunny|1 year ago

It depends on how well you understand how the fancy autocomplete is working under the hood.

You could compare GPT-o1 chain of thought to something like IBM's DeepBlue chess-playing computer, which used MTCS (tree search, same as more modern game engines such as AlphaGo)... at the end of the day it's just using built-in knowledge (pre-training) to predict what move would most likely be made by a winning player. It's not unreasonable to characterize this as "fancy autocomplete".

In the case of an LLM, given that the model was trained with the singular goal of autocomplete (i.e. mimicking the training data), it seems highly appropriate to call that autocomplete, even though that obviously includes mimicking training data that came from a far more general intelligence than the LLM itself.

All GPT-o1 is adding beyond the base LLM fancy autocomplete is an MTCS-like exploration of possible continuations. GPT-o1's ability to solve complex math problems is not much different from DeepBlue's ability to beat Garry Kasparov. Call it intelligent if you want, but better to do so with an understanding of what's really under the hood, and therefore what it can't do as well as what it can.

int_19h|1 year ago

Saying "it's just autocomplete" is not really saying anything meaningful since it doesn't specify the complexity of completion. When completion is a correct answer to the question that requires logical reasoning, for example, "just autocomplete" needs to be able to do exactly that if it is to complete anything outside of its training set.

HaZeust|1 year ago

At that point, how are you not just a fancy autocomplete?

lionkor|1 year ago

Fun little counterpoint: How can you _prove_ that this exact question was not in the training set?

bmitc|1 year ago

How exactly does a blog post from OpenAI about a preview release address my comment or make fancy autocomplete comparisons untenable?

Sunhold|1 year ago

It shows that the LLM is capable of reasoning.