top | item 38377716

(no title)

grey8 | 2 years ago

I agree, the README is not really understandable if you're not into AI research techno-babble. Just adding one sentence targeted at normal people would maybe have been useful.

To answer your question, it's a model that you can give image and videos, which you can then interact with via an LLM (ask questions, describe, process further, etc.) It can "see" them, basically.

It the same capability as GPT-4V (ChatGPT's "upload image" feature), except that ChatGPT only offers images.

discuss

No comments yet.