top | item 44209267

(no title)

I’m currently working on a document parsing engine for a specific type of document. The inputs are usually PDFs. I’m able to get great structured output from both the latest Gemini Flash models and the latest Llama Scout models. The best latency I get with Gemini is about 5 seconds end to end. With llama hosted on groq it’s about 3 seconds.

My use case is latency constrained, so I’m exploring fine tuning / distilling to see if I can get latency sub second. I imagine these are the kinds of scenarios where it’s still worth it to fine-tune and distill.

My plan is to generate a lot of synthetic training data using more capable slower foundation models and use that to train the smaller model.

discuss

cpard|8 months ago

Do you use any framework to generate the data and how do you evaluate the quality of the generated data?