top | item 38848229

(no title)

nerfborpit | 2 years ago

This reads like a marketing piece, not an honest technical blogpost.

I agree that Usearch is fast, but it feels pretty dishonest to take credit for someone else's work. Like maybe at least honestly profile what's going on with USearch vs pgvector (..and which settings for pgvector??), and write something interesting about it?

The last time I tried Lantern, it'd segfault when I tried to do anything non-trivial with it, and was incredibly unsafe with how it handled memory. Hopefully that's at least fixed.. but lantern has so many red flags.

discuss

order

ashvardanian|2 years ago

USearch author here :)

Not sure if it's fair to compare USearch and pgvector. One is an efficient indexing structure, the other is more like a pure database plugin. Not that they can't be used in a similar fashion.

If you are looking for pure indexing benchmarks, you might be interested in USearch vs FAISS HNSW implementation [1]. We run them ourselves (and a couple of other tech companies), so take them with a grain of salt. They might be biased.

As for Lantern vs pgvector, impressed by the result! A lot of people would benefit from having fast vector-search compatible with Postgres! The way to go!

It's wasn't a trivial integration by any means, and the Lantern team was very active - suggesting patches into the upstream version to ease integrations with other databases. Some of those are tricky and have yet to be merged [2]. So stay tuned for the USearch v3. Lots of new features coming :)

[1]: https://www.unum.cloud/blog/2023-11-07-scaling-vector-search... [2]: https://github.com/unum-cloud/usearch/pull/171/files

diqi|2 years ago

Hi, sorry that you didn't have a good experience with Lantern before. We first posted in HN about 3 months ago - Things should be better now, please let us know if you have any issues.

nerfborpit|2 years ago

Using ivfflat is much faster for bulk index creation than lantern. There are a lot of trade offs depending on what everyone's specific use case is, but it seems like a pretty massive thing to leave out.

``` postgres=# CREATE INDEX ON sift USING ivfflat (v vector_l2_ops) WITH (lists=1000); CREATE INDEX Time: 65697.411 ms (01:05.697) ```

ngalstyan4|2 years ago

cofounder here.

You are right that there are many trade-offs between HNSW and IVFFLAT.

E.g. IVFFLAT requires there be significant amount of data in the table, before the index is created, and assumes data distribution does not change with additional inserts (since it chooses centroids during the initial creation and never updates them)

We have also generally had harder time getting high recall with IVFFLAT on vectors from embedding models such as ada-002.

There are trade-offs, some of which we will explore in later blog posts.

This post is about one thing - HNSW index creation time across two systems, at a fixed 99% recall.