(no title)
nerfborpit | 2 years ago
``` postgres=# CREATE INDEX ON sift USING ivfflat (v vector_l2_ops) WITH (lists=1000); CREATE INDEX Time: 65697.411 ms (01:05.697) ```
nerfborpit | 2 years ago
``` postgres=# CREATE INDEX ON sift USING ivfflat (v vector_l2_ops) WITH (lists=1000); CREATE INDEX Time: 65697.411 ms (01:05.697) ```
ngalstyan4|2 years ago
You are right that there are many trade-offs between HNSW and IVFFLAT.
E.g. IVFFLAT requires there be significant amount of data in the table, before the index is created, and assumes data distribution does not change with additional inserts (since it chooses centroids during the initial creation and never updates them)
We have also generally had harder time getting high recall with IVFFLAT on vectors from embedding models such as ada-002.
There are trade-offs, some of which we will explore in later blog posts.
This post is about one thing - HNSW index creation time across two systems, at a fixed 99% recall.
nerfborpit|2 years ago