(no title)
sally_glance | 1 month ago
Number two I'm not sure if random samples collected over even a moderately large number of users does make a great base of training examples for distillation. I would expect they need some more focused samples over very specific areas to achieve good results.
skrebbel|1 month ago
sally_glance|1 month ago
The economics probably work out though, collecting, cleaning and preparing original datasets is very cumbersome.
What we do know for sure is that the SOTA providers are distilling their own models, I remember reading about this at least for Gemini (Flash is distilled) and Meta.