top | item 43287086

(no title)

Similar arguments to LeCun.

People are going to keep saying this about autoregressive models, how small errors accumulate and can't be corrected, while we literally watch reasoning models say things like "oh that's not right, let me try a different approach".

To me, this is like people saying "well NAND gates clearly can't sort things so I don't see how a computer could".

Large transformers can clearly learn very complex behavior, and the limits of that are not obvious from their low level building blocks or training paradigms.

discuss

dartos|1 year ago

> while we literally watch reasoning models say things like "oh that's not right, let me try a different approach".

Not saying I disagree with your premise that errors can’t be corrected by using more and more tokens, but this argument is weird to me.

The model isn’t intentionally generating text. The kinds of “oh let me try a different approach” lines I see are often followed by the same approach just taken. I wouldn’t say most of the time, but often enough that I notice.

Just because a model generates text doesn’t mean that the text actually represents anything at all, let alone a reflection of an internal process.

TeMPOraL|1 year ago

> Just because a model generates text doesn’t mean that the text actually represents anything at all, let alone a reflection of an internal process.

What does it represent then? What are all these billion weights for? It's not a bag full of NULLs that just pulls next words from a look-up table. Obviously there is some kind of internal process.

Also I don't get why people ignore the temporal aspect. Humans too generate thoughts in sequence, and can't arbitrarily mutate what came before. Time and memory is what forces sequential order - we too just keep piling on more thoughts to correct previous thoughts while they are still in working memory (context).

naasking|1 year ago

> The model isn’t intentionally generating text.

What's the mechanistic model of "intention" that you're using to claim that there is no intention in the model's operation?

> Just because a model generates text doesn’t mean that the text actually represents anything at all, let alone a reflection of an internal process.

Generating text is the trace of an internal process in an LLM.

PartiallyTyped|1 year ago

I'd argue that humans are by definition autoregressive "models", and we can change our minds mid thought as we process logical arguments. The issue around small errors accumulating makes sense if there is no sense of evaluation and recovery, but clearly, both evaluation and recovery is done.

Of course, this usually requires the human to have some sense of humility and admit their mistakes.

I wonder, what if we trained more models with data that self-heals or recovers mid sentence?

yorwba|1 year ago

As the number of self-corrections increases, it also increases the likelihood that it will say "oh that's not right, let me try a different approach" after finding the correct solution. Then you can get into a second-guessing loop that never arrives at the correct answer.

If the self-check is more reliable than the solution-generating process, that's still an improvement, but as long as the model makes small errors when correcting itself, those errors will still accumulate. On the other hand, if you can have a reliable external system do the checking, you can actually guarantee correctness.

solveit|1 year ago

Error correction is possible even if the error correction is itself noisy. The error does not need to accumulate, it can be made as small as you like at the cost of some efficiency. This is not a new problem, the relevant theorems are incredibly robust and have been known for decades.

energy123|1 year ago

Yann LeCun's prediction was empirically refuted. He says that the longer LLMs run, the less accurate they get. OpenAI showed the opposite is true.

mrfox321|1 year ago

They didn't show this, they just increased the length where accuracy breaks down.

Wonderfall|1 year ago

LeCun is for sure a source of inspiration, and I think he has a fair critique that still holds true despite what people think when they see reasoning models in action. But I don't think like him that autoregressive models are a doomed path or whatever. I just like to question things (and don't have absolute answers).

I-JEPA and V-JEPA have recently shown promising results as well.

Tostino|1 year ago

I think recurrent training approaches like those discussed in COCONUT and similar papers show promising potential. As these techniques mature, models could eventually leverage their recurrent architecture to perform tasks requiring precise sequential reasoning, like odd/even bit counting that current architectures struggle with.