(no title)
emanuer | 7 months ago
What am I missing?
Flash 2.5, Sonnet 3.7, etc. always provided me with very satisfactory image analysis. And, I might be making this up, but to me it feels like some models provide better responses when I give them the text as an image, instead of feeding "just" the text.
ArnavAgrawal03|7 months ago
You need to apply things like quantization, single-vector conversions (using fixed dimensional encodings), and better indexing to ensure that multimodal RAG works at scale.
That is exactly what we're doing at Morphik :)
barrenko|7 months ago