A gentle introduction to vector databases

[+] splatcollision|4 years ago|reply

I've done vector indexes on CouchDB before, because it supports arrays as keys in it's map-reduce implementation. Worked great for fast document similarity search.

Brief writeup: http://splatcollision.com/page/fast-vector-similarity-querie...

[+] joexner|4 years ago|reply

Vector indices are the novel part of vector databases. Let's hear more about them. The rest is just BLOB CRUD.

[+] gk1|4 years ago|reply

Here you go: https://www.pinecone.io/learn/vector-indexes/

[+] dontreact|4 years ago|reply

The way that vector indices work typically can make doing CRUD with them a real challenge. There is definitely novelty in being able to do both ANN indexing and fast high throughput CRUD.

In addition, the R of crud is hard to combine with vector indices. Case in point I am still waiting for elastic search to support both ANN and regular, structured filtering together well.

[+] mrintellectual|4 years ago|reply

Thanks for your feedback. I'm writing a post on vector indices and will throw it up this week.

[+] tabtab|4 years ago|reply

This reminds me of "Factor Tables": https://github.com/RowColz/AI

[+] mrintellectual|4 years ago|reply

As mentioned in the article, I recommend Milvus (https://milvus.io) - it's open source and cloud native with standalone versions available. Alternatively, if you're looking for an open-source solution for generating embeddings, I recommend (https://github.com/towhee-io/towhee).

[+] dtjohnnyb|4 years ago|reply

One downside for milvus is that version 1 doesn't do filtering (necessary for most search applications) and version 2 is significantly slower. Google's vector nearest neighbors offering, weaviate, and Vespa are much better options if you're expecting to extend to more realistic workloads

[+] phenkdo|4 years ago|reply

Nice writeup. Have you looked at qdrant [1] for your comparison? I found it better than Milvus.

[1] https://github.com/qdrant/qdrant

[+] occupant|4 years ago|reply

What did you find better about it?

[+] cbsmith|4 years ago|reply

Everything old is new again. ;-)

[+] gk1|4 years ago|reply

This is a great writeup, and awesome to see vector databases come up more and more often.

For anyone interested in going down this rabbit hole, we have an entire learning center about vector databases and vector search (https://www.pinecone.io/learn/) including the obligatory "What is a Vector Database" intro with example notebooks: https://www.pinecone.io/learn/vector-database/

[+] dang|4 years ago|reply

You've posted several comments in this thread alone linking to your product, and it seems that the majority of your posts have been doing this for quite a while now. I'm sure it's excellent work, but can you please stop doing this?

It's fine to link to your own work occasionally, when it's particularly relevant, as part of a diverse mix of posts on unrelated things*. It's not ok to use HN primarily for promotion. See https://news.ycombinator.com/newsguidelines.html: "Please don't use HN primarily for promotion. It's ok to post your own stuff occasionally, but the primary use of the site should be for curiosity."

When people do that we eventually start penalizing their accounts and sites, or in egregious cases, banning them. You're a good HN user, but this is still excessive. You're crossing the line at which the community starts to think of the word 'spam', and we inevitably start getting emails about it.

* I do get that your work is particularly relevant in a thread like this. What's missing is the 'diverse mix of posts on unrelated things'. In such a context, posting repeatedly about your own stuff starts to come across the wrong way.

[+] starkd|4 years ago|reply

Thank you for this. One approach I find missing in your blog is that of distance-based indexing. It's an approach that indexes vectors according to distances from chosen vantage points from within the data set. I've done some preliminary work on creating a system for images: phash.dev

[+] liminal|4 years ago|reply

Pinecone looks great. Any plans to have a non-hosted option?

[+] krishnakatyal|4 years ago|reply

Very well written

33 comments