top | item 43128492

(no title)

rdli | 1 year ago

The blog post was a little unclear, so my summary was:

- They used QwQ to generate training data (with some cleanup using GPT-4o-mini)

- The training data was then used to FT Qwen2.5-32B-Instruct (non-reasoning model)

- Result was that Sky-T1 performs slightly worse than QwQ but much better than Qwen2.5 on reasoning tasks

There are a few dismissive comments here but I actually think this is pretty interesting as it shows how you can FT a foundation model to do better at reasoning.

discuss

azinman2|1 year ago

I wish they would have compared to the r1 distills of qwen2.5