top | item 40102598

(no title)

earslap | 1 year ago

Ah yes, I totally agree. I was inspecting the method as a stopgap solution (especially because it does not require retraining or any other special tricks) until researchers figure out "planning" in a broader sense. It is very inefficient otherwise, but in the meantime, is just simple sampling with a couple parameters to tune from the output softmax the best we can do? is there no low hanging fruit there?

discuss

order

HarHarVeryFunny|1 year ago

I suppose the closest alternative to planning ahead (considering alternatives before taking any action - in this case generating tokens) is getting it right the first time, which is only really possible in cases of highly constrained circumstances (prompts) where the model saw enough similar examples to predict the same correct/preferred response. So, to that extent, I suppose better prediction - bigger model, more/better training, etc, reduces the need for planning a bit. Architectural changes, such as adding working memory, that boost predictive power, would also help.

But, yeah, hard to see too many alternatives.

1) Get it right first time (not always possible)

2) Don't plan, but at least consider a bunch of poor alternatives - tree of thoughts

3) Actually implement planning