top | item 46923463

Reinforcement Learning from Human Feedback

133 points| onurkanbkrc | 24 days ago |rlhfbook.com

https://arxiv.org/abs/2504.12501

5 comments

order

verdverm|24 days ago

Last time I saw Nathan say something about the book, he's actively working on the next version and looking for feedback, check his socials

leggerss|24 days ago

You could say he's also learning from human feedback