top | item 42786955

(no title)

throwaway4aday | 1 year ago

That's essentially what R1 Zero is showing:

> Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT.

discuss

order

No comments yet.