top | item 42772954 (no title) justinl33 | 1 year ago > This is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT.This is a noteworthy achievement. discuss order hn newest throwaway314155|1 year ago Excuse my ignorance. What does SFT refer to here? josephcsible|1 year ago Supervised fine-tuning
throwaway314155|1 year ago Excuse my ignorance. What does SFT refer to here? josephcsible|1 year ago Supervised fine-tuning
throwaway314155|1 year ago
josephcsible|1 year ago