top | item 42702929

(no title)

Makes sense. My main takeaway from the ColPali paper (and your comments) is that ColPali works best for document RAG, whereas vision model embeddings are best used for image similarity search or sentiment analysis. So to answer my own question: The best model to use depends on the application.

discuss

No comments yet.