top | item 42475372

(no title)

To spend more compute at inference time, at least two simple approaches are readily available:

1) make model output a full solution, step-by-step, then induce it to revise the solution - repeat this as many times as you have token-budget for. You can do this via prompting alone (see Reflexion for example), or you can fine-tune the model to do that. The paper explores fine-tuning of the base model to turn it into self-revision model.

2) sample step-by-step (one "thought"-sentence per line) solutions from the model, and do it at non-zero temperature to be able to sample multiple next-steps. Then use verifier model to choose between next-step candidates and prefer to continue the rollout of the more promising branches of "thoughts". There are many many methods of exploring such tree when you can score intermediate nodes (beam search is an almost 50 years old algorithm!).

discuss

No comments yet.