For some interesting context: this paper was a precursor to all the work on synthetic data at Microsoft Research that lead to the Phi series of SLMs. [1] It was an important demonstration of what carefully curated and clean data could do for language models.1: https://arxiv.org/abs/2412.08905
No comments yet.