top | item 40100919

(no title)

earslap | 1 year ago

For the existing models is beam-search like methods hopeless due to combinatorial explosion? Are there no smart ways to improve it? Evaluating multiple futures will be slow but if it means that the model can give vastly better output, it might be a worthwhile trade-off in some cases. I feel like our standard way of sampling the output of the LLMs is a bit too simplistic and my hunch is that it should be possible to get a lot more out of them even if it means losing speed.

discuss

order

HarHarVeryFunny|1 year ago

People are considering that sort of beam-search approach - this is what they call "tree of thoughts" - generate a branching tree of alternate continuations, then pick the best one based on some criteria.

This doesn't seem an ideal approach though, since it amounts to generating a bunch of shallow responses and picking the best, rather than the preferred thinking more deeply before generating. It's not the same as a computer chess program considering N-moves ahead where you are guaranteed that one of those move sequences really is the best one (as long as you don't accidentally prune it out). In contrast, if you generate all possible "shallow" N-token responses (bunch of monkeys gibbering), there is no guarantee any of those will be the high quality response you are hoping for.

Really planning ahead - reasoning deeply before speaking - would seem harder to implement though, since it'd involve applying a variable number of reasoning steps (maybe looping), then determining when to stop. This also seems different from the proposed insertion of "reasoning tokens" since those are shallow reasoning steps (normal single pass through transformer's layers), when it seems what is really needed is more depth of reasoning ("more layers"), perhaps coupled with some working memory/tokens. Both schemes (more tokens vs more depth) are also related to the wish to use a variable amount of compute for different tasks/inputs - less compute for simple tasks, more for hard ones.

earslap|1 year ago

Ah yes, I totally agree. I was inspecting the method as a stopgap solution (especially because it does not require retraining or any other special tricks) until researchers figure out "planning" in a broader sense. It is very inefficient otherwise, but in the meantime, is just simple sampling with a couple parameters to tune from the output softmax the best we can do? is there no low hanging fruit there?