I added an edited note to the bottom of the blog post.
The original post and the experiments were created before pgvector 0.5.1 was out, and we had not realized there was significant work to optimize index creation time in the latest pgvector release.
We reran pgvector benchmarks with pgvector 0.5.1.
Now pgvector index creation is on par or 10% faster than lantern on a single core. Lantern still allows 30x faster index creation by leveraging additional cores.
kiwicopple|2 years ago
https://x.com/pgvector/status/1711910075416432785?s=46
Do you have the code you used so that we can reproduce these results?
diqi|2 years ago
The original post and the experiments were created before pgvector 0.5.1 was out, and we had not realized there was significant work to optimize index creation time in the latest pgvector release.
We reran pgvector benchmarks with pgvector 0.5.1. Now pgvector index creation is on par or 10% faster than lantern on a single core. Lantern still allows 30x faster index creation by leveraging additional cores.
Wiki Pgvector - 36m Lantern - 43m Lantern external indexing (32 CPU): 2m 15s
Sift Pgvector - 12m30s Lantern - 7m Lantern external indexing (32 CPU): 25s
The DB parameters for the above results (both Lantern and pgvector): shared_buffers=12GB maintenance_work_mem=5GB work_mem=2GB
The DB parameters for the previous results were the defaults for both Lantern and pgvector.
Benchmarking was done using psql timing and used a 32CPU/64GB RAM machine (Linode Dedicated 64).
Feel free to reach out if you need anything for benchmarks.