top | item 43289074

(no title)

spr-alex | 1 year ago

maybe i understand this a little differently, the argument i am most familiar with is this one from lecun, where the error accumulation in the prediction is the concern with autoregression https://drive.google.com/file/d/1BU5bV3X5w65DwSMapKcsr0ZvrMR...

discuss

antirez|1 year ago

The error accumulation thing is basically without any ground as regressive models correct what they are saying in the process of emitting tokens (trivial to test yourself: force a given continuation in the prompt and the LLMs will not follow at all). LeCun provided an incredible amount of wrong claims about LLMs, many of which he now no longer accepts: like the stochastic parrot claim. Now the idea that there is just a statistical relationship in the next token prediction is considered laughable, but even when it was formulated there were obvious empirical hints.

spr-alex|1 year ago

i think the opposite, the error accumulation thing is basically the daily experience of using LLMs.

As for the premise that models cant self correct that's not the argument i've ever seen, transformers have global attention across the context window. It's that their prediction abilities are increasingly poor as generation goes on. Is anyone having a different experience than that?

Everyone doing some form of "prompt engineering" whether with optimized ML tuning, whether with a human in the loop, or some kind of agentic fine tuning step, runs into perplexity errors that get worse with longer contexts in my opinion.

There's some "sweet spot" for how long of a prompt to use for many use cases, for example. It's clear to me that less is more a lot of the time

Now will diffusion fare significantly better on error is another question. Intuition would guide me to think more flexiblity with token-rewriting should enable much greater error correction capabilities. Ultimately as different approaches come online we'll get PPL comparables and the data will speak for itself

HeatrayEnjoyer|1 year ago

>force a given continuation in the prompt and the LLMs will not follow at all

They don't? That's not the case at all, unless I am misunderstanding.