top | item 44227392

(no title)

Yes, 100% this. And even more so for reasoning models, which have a different kind of RL workflow based on reasoning tokens. I expect to see research labs come out with more ways to use RL with LLMs in the future, especially for coding.

I feel it is quite important to dispel this idea given how widespread it is, even though it does gesture at the truth of how LLMs work in a way that's convenient for laypeople.

https://www.harysdalvi.com/blog/llms-dont-predict-next-word/

discuss

No comments yet.