top | item 45121408

(no title)

charleshn | 5 months ago

You can have a look at the DeepSeek paper, in particular section "2.2 DeepSeek-R1-Zero: Reinforcement Learning on the Base Mode".

But generally the idea is that it's, you need some notion of reward, verifiers etc.

Works really well for maths, algorithms, amd many things actually.

See also this very short essay/introduction: https://www.jasonwei.net/blog/asymmetry-of-verification-and-...

That's why we have IMO gold level models now, and I'm pretty confident we'll have superhuman mathematics, algorithmic etc models before long.

Now domains which are very hard to verify - think e.g. theoretical physics etc - that's another story.

discuss

order

skeezyboy|5 months ago

> But generally the idea is that it's, you need some notion of reward, verifiers etc.

i dont think youre getting the point hes making.