(no title)
activatedgeek | 1 year ago
In any case, there is at least one work that shows that CoT may not be necessary and biasing the decoding path via logit probabilities is also promising. [1]
One could argue it still doesn't contradict the benefits of CoT, but I suspect there is nothing fundamental about CoT, except that we happened to have been pre-training on sequences that use certain prompts that were easy to conceive from a human's perspective.
FeepingCreature|1 year ago
"Let's think step by step" is a hack. It was always a hack. Its main payoff was showing people where the model had a weakness and how to (hackily, heavily dependent on training data and phrasing) route around it. Now with QS, the models will be able to bypass that weakness on their own.
> I hesitate to the use description as "think," just biasing correlations for subsequent generations.
This is of course a fully general description of any iterative computation. :)
activatedgeek|1 year ago