top | item 38848978

(no title)

ngalstyan4 | 2 years ago

cofounder here.

You are right that there are many trade-offs between HNSW and IVFFLAT.

E.g. IVFFLAT requires there be significant amount of data in the table, before the index is created, and assumes data distribution does not change with additional inserts (since it chooses centroids during the initial creation and never updates them)

We have also generally had harder time getting high recall with IVFFLAT on vectors from embedding models such as ada-002.

There are trade-offs, some of which we will explore in later blog posts.

This post is about one thing - HNSW index creation time across two systems, at a fixed 99% recall.

discuss

order

nerfborpit|2 years ago

External index creation also requires that a significant amount of data be in the table for it to be worth it, along with all the other potential issues.