(no title)
publicdaniel | 8 months ago
My use case is latency constrained, so I’m exploring fine tuning / distilling to see if I can get latency sub second. I imagine these are the kinds of scenarios where it’s still worth it to fine-tune and distill.
My plan is to generate a lot of synthetic training data using more capable slower foundation models and use that to train the smaller model.
cpard|8 months ago