top | item 43446885

(no title)

scribu | 11 months ago

If the base models already have the “reasoning” capability, as they claim, then it’s not surprising that they were able to get to SOTA using a relatively negligible amount of compute for RL fine-tuning.

I love this sort of “anti-hype” research. We need more of it.

discuss

No comments yet.