For the existing models is beam-search like methods hopeless due to combinatorial explosion? Are there no smart ways to improve it? Evaluating multiple futures will be slow but if it means that the model can give vastly better output, it might be a worthwhile trade-off in some cases. I feel like our standard way of sampling the output of the LLMs is a bit too simplistic and my hunch is that it should be possible to get a lot more out of them even if it means losing speed.
HarHarVeryFunny|1 year ago
This doesn't seem an ideal approach though, since it amounts to generating a bunch of shallow responses and picking the best, rather than the preferred thinking more deeply before generating. It's not the same as a computer chess program considering N-moves ahead where you are guaranteed that one of those move sequences really is the best one (as long as you don't accidentally prune it out). In contrast, if you generate all possible "shallow" N-token responses (bunch of monkeys gibbering), there is no guarantee any of those will be the high quality response you are hoping for.
Really planning ahead - reasoning deeply before speaking - would seem harder to implement though, since it'd involve applying a variable number of reasoning steps (maybe looping), then determining when to stop. This also seems different from the proposed insertion of "reasoning tokens" since those are shallow reasoning steps (normal single pass through transformer's layers), when it seems what is really needed is more depth of reasoning ("more layers"), perhaps coupled with some working memory/tokens. Both schemes (more tokens vs more depth) are also related to the wish to use a variable amount of compute for different tasks/inputs - less compute for simple tasks, more for hard ones.
earslap|1 year ago