top | item 43000030

(no title)

ciphix | 1 year ago

After reviewing the paper and GitHub training dataset, I have the following observations:

The 800+ training samples, each containing solutions with detailed reasoning steps, were primarily generated by DeepSeek r1 and advanced models. The reasoning processes within these training solutions are crucial. It's possible that the advanced models have encoded these reasoning processes through the generated samples. Given a sufficiently large model, it can effectively restore such reasoning weights, effectively adding a delta from DeepSeek r1, among others.

Therefore, it's not surprising that, with relatively few fine-tuning data, Qwen 2.5 has achieved such significant improvements.

This is merely a conjecture. Further research is needed to analyze and visualize the changes in network weights before and after fine-tuning.

discuss

No comments yet.