(no title)
rdli | 1 year ago
- They used QwQ to generate training data (with some cleanup using GPT-4o-mini)
- The training data was then used to FT Qwen2.5-32B-Instruct (non-reasoning model)
- Result was that Sky-T1 performs slightly worse than QwQ but much better than Qwen2.5 on reasoning tasks
There are a few dismissive comments here but I actually think this is pretty interesting as it shows how you can FT a foundation model to do better at reasoning.
azinman2|1 year ago