top | item 40372165

(no title)

llama_person | 1 year ago

There are advantages to smaller models, namely you can process a lot more data, with a lot less vram. I think the intent here from the Google team is for a task-specific VLM that you fine-tune on your data, rather than a general purpose assistant.

From my own experimentation I have found it to really pack a punch for its weight. Another small model which has been very good has been https://github.com/vikhyat/moondream .

discuss

order

whimsicalism|1 year ago

finetuning is easily within reach for llava-mistral or something like that, just rent an a100 or two for ~$20 bucks and you'll have your finetuned model

codester1000|1 year ago

LLaVA is even less open than PaliGemma, it is trained on CC-BY-NC4.0 data so it can't be used commercially. I emailed the team about it. At least with Pali-Gemma the base pt models are available to be used commercially if you fine-tune them yourself

simonw|1 year ago

Have you seen any good documentation anywhere on how to do that?