(no title)
jonathan-adly | 1 year ago
1. Uses half-vecs, so you cut down everything by half with no recall loss 2. Uses token pooling with hierarchial clustering at 3, so, you further cut down things by 2/3rd with <1% loss 3. Everything is on Postgres and pgvector, so you can do all the Postgres stuff and decrease corpus size by document metadata filtering 4. We have a 5000+ pages corpus in production with <3 seconds latency. 5. We benchmark against the Vidore leaderboard, and very near SOTA
You can read about half-vecs here: https://jkatz05.com/post/postgres/pgvector-scalar-binary-qua...
Hierarchical token pooling: https://www.answer.ai/posts/colbert-pooling.html
And how we implemented them here: https://blog.colivara.com/
__jl__|1 year ago
jonathan-adly|1 year ago
It is a small upgrade, but one nonetheless. The complexity, and the cost of multi-vectors *might* not make this worth it, really depends on how accuracy-critical the task is.
For example, one of our customers who does this over FDA monographs, which is like 95%+ text, and 5% tables - they misses were extremely painful - even though there weren't that many in text-based pipelines. So, the migrations made sense to them.
tarasglek|1 year ago