(no title)
natrys | 3 months ago
But I suppose the bigger goal remains improving their language model, and this was an experimentation born from that. These works are symbiotic; the original DeepSeekMath resulted in GRPO, which eventually formed the backbone of their R1 model: https://arxiv.org/abs/2402.03300
No comments yet.