top | item 44401813

(no title)

mnkv | 8 months ago

reasonable post with a decent analogy explaining on-policy learning, only major thing I take issue with is

> Reinforcement learning is a technical subject—there are whole textbooks written about it.

and then linking to the still wip RLHF book instead of the book on RL: Sutton & Barto.

discuss

Haha that's crazy I'm so used to reading RL papers that when the blog linked to a textbook about RL I just filled in Sutton & Barto without clicking on the link or thinking any further about the matter.

I think the other criticism I have is that the historical importance of RLHF to ChatGPT is sort of sidelined, and the author at the beginning pinpoints something like the rise of agents as the beginning of the influence of RL in language modelling. In fact, the first LLM that attained widespread success was ChatGPT, and the secret sauce was RLHF... no need to start the story so late in 2023-2024.