(no title)
deet | 11 months ago
At Avy.ai we're running small (2B-7B, quantized) vision models as part of a Mac desktop application for understanding what someone is working on in the moment, to offer them related information and actions.
We found that the raw results in understanding the images with a light LORA fine tune are not substantially different -- but the ease of getting a small model to follow instructions in outputting structured data in response to the image and at the level of verbosity and detail we need is greatly enhanced with fine tuning. Without fine tuning the models on the smaller end of that scale would be much more difficult to use, not reliably producing output that matched what the consuming application expects
msp26|11 months ago
deet|11 months ago
The bigger thing though was getting the models to have the appropriate levels of verbosity and detail in their ouput which fine tuning made more consistent.