top | item 46057627

(no title)

moosedev | 3 months ago

2024 lecture videos are on YouTube: https://youtube.com/playlist?list=PLoROMvodv4rN4wG6Nk6sNpTEb...

discuss

order

rllearner|3 months ago

One of my favorite parts of the 2024 series on Youtube was when Prof B explained her excitement just before introducing UCB algorithms (Lecture 11): "So now we're going to see one of my favorite ideas in the course, which is optimism under uncertainty... I think it's a lovely principle because it shows why it's provably optimal to be optimistic about things. Which is kind of beautiful."

Those moments are the best part of classroom education. When a super knowledgeable person spends a few weeks helping you get to the point where you can finally understand something cool. And you can sense their excitement to tell you about it. I still remember learning Gauss-Bonnet, Stokes Theorem, and the Central Limit Theorem. I think optimism under uncertainty falls in that group.

storus|3 months ago

Those don't have DPO/GRPO which arguably made some parts of RL obsolete.

nafizh|3 months ago

check out cs 336 stanford, they cover DPO/GRPO and relevant parts needed to train LLMs.

upbeat_general|3 months ago

I can assure you that lacking knowledge in DPO (and especially GRPO it’s just stripped down PPO) is not a dealbreaker.