top | item 39843981

(no title)

barefeg | 1 year ago

This technique had a very recent resurgence via https://txt.cohere.com/int8-binary-embeddings/. Hugging face also covered the technique here https://huggingface.co/blog/embedding-quantization. It seems like a very good tradeoff compared to the shorter embeddings which require fine tuning via the matryoshka technique. On the other hand, Nils Reimers suggests that trivial quantization of the full precision embeddings is not as good as using “compression friendly” embeddings like Cohere’s Embed V3. Does anyone know what’s the difference in precision between trivial quantization and optimized embeddings?

discuss

No comments yet.