top | item 46073985

(no title)

blazespin | 3 months ago

Verifying math requires something like Lean which is a huge bottleneck, as the paper explains.

Plus there isn't a lot of training data in lean.

Most gains come from training on stuff already out there, not really the RLVR part which just amps it up a bit.

discuss

order

No comments yet.