Ah yes, I totally agree. I was inspecting the method as a stopgap solution (especially because it does not require retraining or any other special tricks) until researchers figure out "planning" in a broader sense. It is very inefficient otherwise, but in the meantime, is just simple sampling with a couple parameters to tune from the output softmax the best we can do? is there no low hanging fruit there?
HarHarVeryFunny|1 year ago
But, yeah, hard to see too many alternatives.
1) Get it right first time (not always possible)
2) Don't plan, but at least consider a bunch of poor alternatives - tree of thoughts
3) Actually implement planning