top | item 38848826

(no title)

nerfborpit | 2 years ago

Using ivfflat is much faster for bulk index creation than lantern. There are a lot of trade offs depending on what everyone's specific use case is, but it seems like a pretty massive thing to leave out.

``` postgres=# CREATE INDEX ON sift USING ivfflat (v vector_l2_ops) WITH (lists=1000); CREATE INDEX Time: 65697.411 ms (01:05.697) ```

discuss

order

ngalstyan4|2 years ago

cofounder here.

You are right that there are many trade-offs between HNSW and IVFFLAT.

E.g. IVFFLAT requires there be significant amount of data in the table, before the index is created, and assumes data distribution does not change with additional inserts (since it chooses centroids during the initial creation and never updates them)

We have also generally had harder time getting high recall with IVFFLAT on vectors from embedding models such as ada-002.

There are trade-offs, some of which we will explore in later blog posts.

This post is about one thing - HNSW index creation time across two systems, at a fixed 99% recall.

nerfborpit|2 years ago

External index creation also requires that a significant amount of data be in the table for it to be worth it, along with all the other potential issues.