(no title)
artifishy_intel | 1 year ago
Also - In the main/first figure, why are r1 and o1 (the best performing models in Table 1) omitted?
If you collect 59K and then pick the best 1K, is it really fair to say your approach is simple? Sifting through 59K examples doesn't seem simple.
Good stuff though, cool to see how minimal we can get to distill good models (esp. at the manageable 32 size).
No comments yet.