Interested to hear more about your experience here. At Halcyon, we have trillions of embeddings and found Postgres to be unsuitable at several orders of magnitude less than we currently have.
On the iterative scan side, how do you prevent this from becoming too computationally intensive with a restrictive pre-filter, or simply not working at all? We use Vespa, which means effectively doing a map-reduce across all of our nodes; the effective number of graph traversals to do is smaller, and the computational burden mostly involves scanning posting lists on a per-node basis. I imagine to do something similar in postgres, you'd need sharded tables, and complicated application logic to control what you're actually searching.
How do you deal with re-indexing and/or denormalizing metadata for filtering? Do you simply accept that it'll take hours or days?
I agree with you, however, that vector databases are not a panacea (although they do remove a huge amount of devops work, which is worth a lot!). Vespa supports filtering across parent-child relationships (like a relational database) which means we don't have to reindex a trillion things every time we want to add a new type of filter, which with a previous vector database vendor we used took us almost a week.
for sure people are running pgvector in prd! i was more pointing at every tutorial
iterative scans are more of a bandaid for filtering than a solution. you will still run into issues with highly restrictive filters. you still need to understand ef_search and max_search_tuples. strict vs relaxed ordering, etc. it's an improvement for sure, but the planner still doesn't deeply understand the cost model of filtered vector search
there isn't a general solution to the pre- vs post-filter problem—it comes down to having a smart planner that understands your data distribution. question is whether you have the resources to build and tune that yourself or want to offload it to a service that's able to focus on it directly
- We're IVF + quantization, can support 15x more updates per second comparing to pgvector's HNSW. Insert or delete an element in a posting list is a super light operation comparing to modify a graph (HNSW)
- Our main branch can now index 100M 768-dim vector in 20min with 16vcpu and 32G memory. This enables user to index/reindex in a very efficient way. We'll have a detailed blog about this soon. The core idea is KMeans is just a description of the distribution, so we can do lots of approximation here to accelerate the process.
- For reindex, actually postgres support `CREATE INDEX CONCURRENTLY` or `REINDEX CONCURRENTLY`. User won't experience any data loss or inconsistency during the whole process.
The author simplifies the complexity of synchronizing between an existing database and a specialized vector database, as well as how to perform joint queries on them. This is also why we see most users choosing vector solution on PostgreSQL.
> The problem is that index builds are memory-intensive operations, and Postgres doesn’t have a great way to throttle them.
maintenance_work_mem begs to differ.
> You rebuild the index periodically to fix this, but during the rebuild (which can take hours for large datasets), what do you do with new inserts? Queue them? Write to a separate unindexed table and merge later?
You use REINDEX CONCURRENTLY.
> But updating an HNSW graph isn’t free—you’re traversing the graph to find the right place to insert the new node and updating connections.
How do you think a B+tree gets updated?
This entire post reads like the author didn’t read Postgres’ docs, and is now upset at the poor DX/UX.
sure, but the knob existing doesn't solve the operational challenge of safely allocating GBs of RAM on prod for hours-long index builds.
> REINDEX CONCURRENTLY
this is still not free not free—takes longer, needs 2-3x disk space, and still impacts performance.
> HNSW vs B+tree
it's not that graph updates are uniquely expensive. vector workloads have different characteristics than traditional OLTP, and pg wasn't originally designed for them
my broader point: these features exist, but using them correctly requires significant Postgres expertise. my thesis isn't "Postgres lacks features"—it's "most teams underestimate the operational complexity." dedicated vector DBs handle this automatically, and are often going to be much cheaper than the dev time put into maintaining pgvector (esp. for a small team)
HNSW indices are big. Let's suppose I have an HNSW index which fits in a few hundred gigabytes of memory, or perhaps a few terabytes. How do I reasonably rebuild this using maintenance_work_mem? Double the size of my database for a week? What about the knock-on impacts on the performance for the rest of my database-stuff - presumably I'm relying on this memory for shared_buffers and caching? This seems like the type of workload that is being discussed here, not a toy 20GB index or something.
> You use REINDEX CONCURRENTLY.
Even with a bunch of worker processes, how do I do this within a reasonable timeframe?
> How do you think a B+tree gets updated?
Sure, the computational complexity of insertion into an HNSW index is sublinear, the constant factors are significant and do actually add up. That being said, I do find this the weakest of the author's arguments.
I've seen a decent amount of production use of pgvector HNSW from our customers on GCP, but as the author noted is not without some flaws and are typically in the smallish range (0-10M vectors) for the systems characteristics that he pointed out - i.e. build times, memory use. The tradeoffs to consider are whether you want to ETL data into yet another system and deal with operational overhead, eventual consistency, application-logic to join vector search with the rest of your operational data. Whether the tradeoffs are worth it really depends on your business requirements.
And if one needs the transactional/consistency semantics, hybrid/filtered-search, low latencies, etc - consider a SOTA Postgres system like AlloyDB with AlloyDB ScaNN which has better scaling/performance (1B+ vectors), enhanced query optimization (adaptive pre-/post-/in-filtering), and improved index operations.
Full disclosure: I founded ScaNN in GCP databases and currently lead AlloyDB Semantic Search. And all these opinions are my own.
I'm still stuck on whether or not vector search (regardless of vendor) is actually the right way to solve the kinds of problems that everyone seems to believe it's great at.
BM25 with query rewriting & expansion can do a lot of heavy lifting if you invest any time at all in configuring things to match your problem space. The article touches on FTS engines and hybrid approaches, but I would start there. Figure out where lexical techniques actually break down and then reach for the "semantic" technology. I'd argue that an LLM in front of a traditional lexical search engine (i.e., tool use) would generally be more powerful than a sloppy semantic vector space or a fine tuning job. It would also be significantly easier to trace and shape retrieval behavior.
Lucene is often all you need. They've recently added vector search capabilities if you think you really need some kind of hybrid abomination.
I'm currently building RAG for our product (using Lucene). What I've found is that embeddings alone don't help much. With hybrid search (BM25+HNSW) they gave me only like +10% boost compared to BM25 alone (on average). In my evaluation datasets, the only case where they helped tremendously was for cases like "a user asks a question in French but the documents are all in English", it went from 6% retrieval to 65% on some datasets.
I got a significant boost (from 65% on average to over 80%) by adding a proper reranker and query rewriting (3 additional phrases to search for).
I think embeddings are overrated in that blog posts often make you believe they are the end of the story. What I've found is that they should be rather treated as a lightweight filtering/screening tool to quickly find a pool of candidates as a first stage, before you do the actual stuff (apply a reranker). If BM25 already works as well as a pre-filtering tool, you don't even need embeddings (with all the indexing headaches).
I like lucene and have used it for many years, but sometimes a conceptually close match is what you want. Lucene and friends are fantastic about word matching, fuzzy searches, stem searches, phonetic searches, faceting and more but have nothing for conceptually or semantically close searches (I understand that they recently added new document vector searches). Also vector searches usually always return something which is not ideal in a lot of cases. I like Reciprocal Rank Fusion myself as it gives the best of both worlds. As a fun trick I use duckdb to do RRF with 5million+ documents and get low double-digit ms response time even under load
My default is basically YAGNI. You should use as few services as possible, and only add something new when there’s issues. If everything is possible in Postgres, great! If not, at least I’ll know exactly what I need from the New Thing.
The post is a clear example of when YAGNI backfires, because you think YAGNI but then, you actually do need it. I had this experience, the author had this experience, you might as well - the things you think you AGN are actually pretty basic expectations and not luxuries: being able to write vectors real-time without having to run other processes out of band to keep the recall from degrading over time, being able to write a query that uses normal SQL filter predicates and similarity in one go for retrieval. These things matter and you won't notice that they actually don't work at scale until later on!
As others have commented, all the mentioned issues are resolved, I will favour in using the PGVector.
If Postgres can be a good choice over Kafka to deliver 100k events/sec [1], then why not PGVector over Chroma or other specialized vector search (unless there is a specific requirement that can't be solved wit minor code/config changes)!
Redis Vector Sets, my work for the last year, I believe address many of such points:
1. Updates: I wrote my own implementation of the HNSW with many changes compared to the paper. The result is that the data structure can be updated while it receives queries, like the other Redis data types. You add vectors with VADD, query for similarity with VSIM, delete with VREM. Also deleting vectors will not perform just a thumbstone deletion. The memory is actually reclaimed immediately.
2. Speed: The implementation is fast, fully threaded reads, partially threaded writes: even for insertion it is easy to stay in the few hundreds of ops/sec, and querying with VSIM is like 50k ops/sec in normal hardware.
3. Trivial: You can reimplement your use case in 10 minutes including learing how it works.
Of course it costs some memory, but less than you may guess: it supports quantization by default, transparently, and for a few millions of elements (most use cases) the memory usage is very low, totally affordable.
Bonus point: if you use vector sets you can ask my help for free. At this stage I support people using vector sets directly.
P.S. in the README there is stale mention about replication code being not really tested. I filled the gap later and added tests, fixed bugs and so forth.
When using vectors / embeddings models, I think there's a lot of low hanging fruit to be had with non-massive datasets - your support documentation, your product info, a lot of search use cases. For these, the interface I really want is more like a file system than a database - I want to be able to just write and update documents like a file system and have the indexes update automatically and invisibly.
So basically, I'd love to have my storage provider give me a vector search API, which I guess is what Amazon S3 vectors is supposed to be (https://aws.amazon.com/s3/features/vectors/)?
Curious to hear what experience people have had with this.
> Post-filter works when your filter is permissive. Here’s where it breaks: imagine you ask for 10 results with LIMIT 10. pgvector finds the 10 nearest neighbors, then applies your filter. Only 3 of those 10 are published. You get 3 results back, even though there might be hundreds of relevant published documents slightly further away in the embedding space.
Is this really how it works? That seems like it’s returning an incorrect result.
Good article - the most use cases i see of pg_vector are typically “chat over their technical docs”
- small corpus
- doesn’t change often / can rebuild the index
- no multi-tenancy avoids much of the issues with post-filtering
Chroma implements SPANN and SPFresh (to avoid the limitations of HNSW), pre-filtering, hybrid search, and has a 100% usage-based tier (many bills are around $1 per month).
The author (human or llm) flips between performance ("millions of vectors") and semantic accuracy ("only 3 match your filter") to push its point, depending on what needs to look worse. AI framing switch that was that was probably introduced by RLHF on humans that don't think critically but want somewhat convincing answers.
For pre-filtering "You’re still searching millions of vectors" isn't valid argument, because the author does not relate to any alternative, and post-filtering is even worse.
Author is a human :). Performance and semantic accuracy are both important. The point about pre-filtering _youre still searching millions of vectors_ is important because once you apply a filter you can no longer use your vector index. And doing a full scan on millions of vectors is quite expensive
> What bothers me most: the majority of content about pgvector reads like it was written by someone who spun up a local Postgres instance, inserted 10,000 vectors, ran a few queries, and called it a day.
I this taste with most posts about Postgres that don’t come from “how we scaled Postgres to X”. It seems a lot of writers are trying to ride the wave of popularity, creating a ton of noise that can end up as tech debt for readers
I don't have much experience in dedicated vector databases, I've only used pgvector, so pardon me if there's an obvious answer to this, but how do people do similarity search combined with other filters and pagination with separate vector DB? It's a pretty common use case at least in my circles.
For example, give me product listings that match the search term (by vector search), and are made by company X (copanies being a separate table). Sort by vector similarity of the search term and give me top 100?.
We have even largely moved away from ElasticSearch to Postgres where we can, because it's just so much easier to implement with new complex filters without needing to add those other tables' data to the index of e.g. "products" every time.
Edit: Ah I guess this is touched a bit in the article with "Pre- vs. Post-Filtering" - I guess you just do the same as with ElasticSearch, predict what you'll want to filter with, add all of that to metadata and keep it up to date.
It's not a module, it is part of every new Redis version now. Well, actually: it is written in the form of a module and with the modules API in order to improve modularity of the Redis internals, but it is a "merged module", a new implementation/concept I implemented in Redis exactly to support the Vector Sets use case. Thank you for mentioning this.
Is there a comprehensive leaderboard like ClickBench but for vector DBs? Something that measures both the qualitative (precision/recall) and quantitative aspects (query perf at 95th/99th percentile, QPS at load, compression ratios, etc.)?
ANN-Benchmark exists but it’s algorithm-focused rather than full-stack database testing, so it doesn’t capture real-world ops like concurrent writes, filtering, or resource management under load.
Would be great to see something more comprehensive and vendor-neutral emerge, especially testing things like: tail latencies under concurrent load, index build times vs quality tradeoffs, memory/disk usage, and behavior during failures/recovery
This quite aligns with our observation at Milvus. Recently, we helped several users migrate from pgvector as the workload grew substantially.
It’s worth recognising the strengths of pgvector:
• For small-to-medium scale workloads (e.g., up to millions of vectors, relatively static data), embedding storage and similarity queries inside Postgres can be a simple, familiar architecture.
• If you already use Postgres and your vector workloads are light (low QPS, few dimensions, little metadata filtering / low concurrency), then piggy-backing vector search on Postgres is attractive: minimal added infrastructure.
• For teams that don’t want to introduce a separate vector service, or want to keep things within an existing RDBMS, pgvector is a compelling choice.
From our experience helping users scale vector search in production, several pain-points emerge when scaling vector workloads inside a general-purpose RDBMS like Postgres:
1. Index build / update overhead
• Postgres isn’t built from the ground-up for high-velocity vector insertions plus large-scale approximate nearest neighbour (ANN) index maintenance, for example, lacking RaBitQ binary quantization supported in purpose built vector db like Milvus.
• For large datasets (tens/hundreds of millions or beyond), building or rebuilding HNSW/IVF indices inside Postgres can be memory- and time-intensive.
• In production systems where vectors are continuously ingested, updated, deleted, this becomes operationally tricky.
2. Filtered search
• Many use-cases require combining vector similarity with scalar/metadata filters (e.g., “give me top 10 similar embeddings where user_status = ‘active’ AND time > X”).
• Need to understand low level planner to juggle pre-filtering, post-filtering, and planner’s cost model wasn’t built for vector similarity search. For a system not designed primarily as a vector DB, this gets complex. Users shouldn't have to worry about such low level details.
3. Lack of support for full-text search / hybrid search
• Purpose built vector db such as Milvus has mature full-text search / BM25 / Sparse vector support.
"Turbopuffer starts at $64 month with generous limits."
Yup, I think this here explains the popularity of pgvector. If $64/month seems like a lot to you, just use pgvector. If it seems cheap, then your usage is complex enough to want a proper vector DB.
[+] [-] xfalcox|4 months ago|reply
We do at Discourse, in thousands of databases, and it's leveraged in most of the billions of page views we serve.
> Pre- vs. Post-Filtering (or: why you need to become a query planner expert)
This was fixed in version 0.8.0 via Iterative Scans (https://github.com/pgvector/pgvector?tab=readme-ov-file#iter...)
> Just use a real vector database
If you are running a single service that may be an easier sell, but it's not a silver bullet.
[+] [-] xfalcox|4 months ago|reply
- halfvec (16bit float) for storage - bit (binary vectors) for indexes
Which makes the storage cost and on-going performance good enough that we could enable this in all our hosting.
[+] [-] whakim|4 months ago|reply
On the iterative scan side, how do you prevent this from becoming too computationally intensive with a restrictive pre-filter, or simply not working at all? We use Vespa, which means effectively doing a map-reduce across all of our nodes; the effective number of graph traversals to do is smaller, and the computational burden mostly involves scanning posting lists on a per-node basis. I imagine to do something similar in postgres, you'd need sharded tables, and complicated application logic to control what you're actually searching.
How do you deal with re-indexing and/or denormalizing metadata for filtering? Do you simply accept that it'll take hours or days?
I agree with you, however, that vector databases are not a panacea (although they do remove a huge amount of devops work, which is worth a lot!). Vespa supports filtering across parent-child relationships (like a relational database) which means we don't have to reindex a trillion things every time we want to add a new type of filter, which with a previous vector database vendor we used took us almost a week.
[+] [-] tacoooooooo|4 months ago|reply
iterative scans are more of a bandaid for filtering than a solution. you will still run into issues with highly restrictive filters. you still need to understand ef_search and max_search_tuples. strict vs relaxed ordering, etc. it's an improvement for sure, but the planner still doesn't deeply understand the cost model of filtered vector search
there isn't a general solution to the pre- vs post-filter problem—it comes down to having a smart planner that understands your data distribution. question is whether you have the resources to build and tune that yourself or want to offload it to a service that's able to focus on it directly
[+] [-] jascha_eng|4 months ago|reply
In theory these can be more efficient than plain pre/post filtering.
[+] [-] dpflan|4 months ago|reply
[+] [-] VoVAllen|4 months ago|reply
- We're IVF + quantization, can support 15x more updates per second comparing to pgvector's HNSW. Insert or delete an element in a posting list is a super light operation comparing to modify a graph (HNSW)
- Our main branch can now index 100M 768-dim vector in 20min with 16vcpu and 32G memory. This enables user to index/reindex in a very efficient way. We'll have a detailed blog about this soon. The core idea is KMeans is just a description of the distribution, so we can do lots of approximation here to accelerate the process.
- For reindex, actually postgres support `CREATE INDEX CONCURRENTLY` or `REINDEX CONCURRENTLY`. User won't experience any data loss or inconsistency during the whole process.
- We support both pre-filtering and post-filtering. Check https://blog.vectorchord.ai/vectorchord-04-faster-postgresql...
- We support hybrid search with BM25 through https://github.com/tensorchord/VectorChord-bm25
The author simplifies the complexity of synchronizing between an existing database and a specialized vector database, as well as how to perform joint queries on them. This is also why we see most users choosing vector solution on PostgreSQL.
[+] [-] nostrebored|4 months ago|reply
[+] [-] VoVAllen|4 months ago|reply
[+] [-] tacoooooooo|4 months ago|reply
[+] [-] inadequatespace|4 months ago|reply
[+] [-] sgarland|4 months ago|reply
maintenance_work_mem begs to differ.
> You rebuild the index periodically to fix this, but during the rebuild (which can take hours for large datasets), what do you do with new inserts? Queue them? Write to a separate unindexed table and merge later?
You use REINDEX CONCURRENTLY.
> But updating an HNSW graph isn’t free—you’re traversing the graph to find the right place to insert the new node and updating connections.
How do you think a B+tree gets updated?
This entire post reads like the author didn’t read Postgres’ docs, and is now upset at the poor DX/UX.
[+] [-] ayende|4 months ago|reply
That kills the indexing process, you cannot let it run with limited amount of memory.
> How do you think a B+tree gets updated?
In a B+Tree, you need to touch log H of the pages. In HNSW graph - you need to touch literally thousands of vectors once your graph gets big enough.
[+] [-] tacoooooooo|4 months ago|reply
> maintenance_work_mem
sure, but the knob existing doesn't solve the operational challenge of safely allocating GBs of RAM on prod for hours-long index builds.
> REINDEX CONCURRENTLY
this is still not free not free—takes longer, needs 2-3x disk space, and still impacts performance.
> HNSW vs B+tree
it's not that graph updates are uniquely expensive. vector workloads have different characteristics than traditional OLTP, and pg wasn't originally designed for them
my broader point: these features exist, but using them correctly requires significant Postgres expertise. my thesis isn't "Postgres lacks features"—it's "most teams underestimate the operational complexity." dedicated vector DBs handle this automatically, and are often going to be much cheaper than the dev time put into maintaining pgvector (esp. for a small team)
[+] [-] whakim|4 months ago|reply
HNSW indices are big. Let's suppose I have an HNSW index which fits in a few hundred gigabytes of memory, or perhaps a few terabytes. How do I reasonably rebuild this using maintenance_work_mem? Double the size of my database for a week? What about the knock-on impacts on the performance for the rest of my database-stuff - presumably I'm relying on this memory for shared_buffers and caching? This seems like the type of workload that is being discussed here, not a toy 20GB index or something.
> You use REINDEX CONCURRENTLY.
Even with a bunch of worker processes, how do I do this within a reasonable timeframe?
> How do you think a B+tree gets updated?
Sure, the computational complexity of insertion into an HNSW index is sublinear, the constant factors are significant and do actually add up. That being said, I do find this the weakest of the author's arguments.
[+] [-] alanwli|4 months ago|reply
And if one needs the transactional/consistency semantics, hybrid/filtered-search, low latencies, etc - consider a SOTA Postgres system like AlloyDB with AlloyDB ScaNN which has better scaling/performance (1B+ vectors), enhanced query optimization (adaptive pre-/post-/in-filtering), and improved index operations.
Full disclosure: I founded ScaNN in GCP databases and currently lead AlloyDB Semantic Search. And all these opinions are my own.
[+] [-] riku_iki|4 months ago|reply
[+] [-] bob1029|4 months ago|reply
BM25 with query rewriting & expansion can do a lot of heavy lifting if you invest any time at all in configuring things to match your problem space. The article touches on FTS engines and hybrid approaches, but I would start there. Figure out where lexical techniques actually break down and then reach for the "semantic" technology. I'd argue that an LLM in front of a traditional lexical search engine (i.e., tool use) would generally be more powerful than a sloppy semantic vector space or a fine tuning job. It would also be significantly easier to trace and shape retrieval behavior.
Lucene is often all you need. They've recently added vector search capabilities if you think you really need some kind of hybrid abomination.
[+] [-] kgeist|4 months ago|reply
I got a significant boost (from 65% on average to over 80%) by adding a proper reranker and query rewriting (3 additional phrases to search for).
I think embeddings are overrated in that blog posts often make you believe they are the end of the story. What I've found is that they should be rather treated as a lightweight filtering/screening tool to quickly find a pool of candidates as a first stage, before you do the actual stuff (apply a reranker). If BM25 already works as well as a pre-filtering tool, you don't even need embeddings (with all the indexing headaches).
[+] [-] mhuffman|4 months ago|reply
[+] [-] clickety_clack|4 months ago|reply
[+] [-] Fripplebubby|4 months ago|reply
[+] [-] esafak|4 months ago|reply
[+] [-] rudderdev|4 months ago|reply
[1] Ref: https://news.ycombinator.com/item?id=44659678
[+] [-] tacoooooooo|4 months ago|reply
[+] [-] antirez|4 months ago|reply
1. Updates: I wrote my own implementation of the HNSW with many changes compared to the paper. The result is that the data structure can be updated while it receives queries, like the other Redis data types. You add vectors with VADD, query for similarity with VSIM, delete with VREM. Also deleting vectors will not perform just a thumbstone deletion. The memory is actually reclaimed immediately.
2. Speed: The implementation is fast, fully threaded reads, partially threaded writes: even for insertion it is easy to stay in the few hundreds of ops/sec, and querying with VSIM is like 50k ops/sec in normal hardware.
3. Trivial: You can reimplement your use case in 10 minutes including learing how it works.
Of course it costs some memory, but less than you may guess: it supports quantization by default, transparently, and for a few millions of elements (most use cases) the memory usage is very low, totally affordable.
Bonus point: if you use vector sets you can ask my help for free. At this stage I support people using vector sets directly.
I'll link here the documentation I wrote myself as it is a bit hard to find, you know... a README inside the repository , in 2025, so odd: https://github.com/redis/redis/blob/unstable/modules/vector-...
P.S. in the README there is stale mention about replication code being not really tested. I filled the gap later and added tests, fixed bugs and so forth.
[+] [-] jjfoooo4|4 months ago|reply
So basically, I'd love to have my storage provider give me a vector search API, which I guess is what Amazon S3 vectors is supposed to be (https://aws.amazon.com/s3/features/vectors/)?
Curious to hear what experience people have had with this.
[+] [-] auraham|4 months ago|reply
[1] https://cocoindex.io/
[2] https://dev.to/cocoindex/how-to-build-index-with-text-embedd...
[+] [-] IntrepidPig|4 months ago|reply
Is this really how it works? That seems like it’s returning an incorrect result.
[+] [-] jeffchuber|4 months ago|reply
Chroma implements SPANN and SPFresh (to avoid the limitations of HNSW), pre-filtering, hybrid search, and has a 100% usage-based tier (many bills are around $1 per month).
Chroma is also apache 2.0 - fully open source.
[+] [-] Xx_crazy420_xX|4 months ago|reply
For pre-filtering "You’re still searching millions of vectors" isn't valid argument, because the author does not relate to any alternative, and post-filtering is even worse.
[+] [-] tacoooooooo|4 months ago|reply
[+] [-] dangoodmanUT|4 months ago|reply
I this taste with most posts about Postgres that don’t come from “how we scaled Postgres to X”. It seems a lot of writers are trying to ride the wave of popularity, creating a ton of noise that can end up as tech debt for readers
[+] [-] SoftTalker|4 months ago|reply
[+] [-] inbx0|4 months ago|reply
For example, give me product listings that match the search term (by vector search), and are made by company X (copanies being a separate table). Sort by vector similarity of the search term and give me top 100?.
We have even largely moved away from ElasticSearch to Postgres where we can, because it's just so much easier to implement with new complex filters without needing to add those other tables' data to the index of e.g. "products" every time.
Edit: Ah I guess this is touched a bit in the article with "Pre- vs. Post-Filtering" - I guess you just do the same as with ElasticSearch, predict what you'll want to filter with, add all of that to metadata and keep it up to date.
[+] [-] epolanski|4 months ago|reply
From what I've seen is fast, has excellent API, and is implemented by a brilliant engineer in the space (Antirez).
But not using these things beyond local tests, I can never really hold opinions over those using these systems in production.
[+] [-] antirez|4 months ago|reply
[+] [-] mkesper|4 months ago|reply
[+] [-] chandureddyvari|4 months ago|reply
ANN-Benchmark exists but it’s algorithm-focused rather than full-stack database testing, so it doesn’t capture real-world ops like concurrent writes, filtering, or resource management under load.
Would be great to see something more comprehensive and vendor-neutral emerge, especially testing things like: tail latencies under concurrent load, index build times vs quality tradeoffs, memory/disk usage, and behavior during failures/recovery
[+] [-] codingjaguar|4 months ago|reply
It’s worth recognising the strengths of pgvector:
• For small-to-medium scale workloads (e.g., up to millions of vectors, relatively static data), embedding storage and similarity queries inside Postgres can be a simple, familiar architecture.
• If you already use Postgres and your vector workloads are light (low QPS, few dimensions, little metadata filtering / low concurrency), then piggy-backing vector search on Postgres is attractive: minimal added infrastructure.
• For teams that don’t want to introduce a separate vector service, or want to keep things within an existing RDBMS, pgvector is a compelling choice.
From our experience helping users scale vector search in production, several pain-points emerge when scaling vector workloads inside a general-purpose RDBMS like Postgres:
1. Index build / update overhead • Postgres isn’t built from the ground-up for high-velocity vector insertions plus large-scale approximate nearest neighbour (ANN) index maintenance, for example, lacking RaBitQ binary quantization supported in purpose built vector db like Milvus.
• For large datasets (tens/hundreds of millions or beyond), building or rebuilding HNSW/IVF indices inside Postgres can be memory- and time-intensive.
• In production systems where vectors are continuously ingested, updated, deleted, this becomes operationally tricky.
2. Filtered search
• Many use-cases require combining vector similarity with scalar/metadata filters (e.g., “give me top 10 similar embeddings where user_status = ‘active’ AND time > X”).
• Need to understand low level planner to juggle pre-filtering, post-filtering, and planner’s cost model wasn’t built for vector similarity search. For a system not designed primarily as a vector DB, this gets complex. Users shouldn't have to worry about such low level details.
3. Lack of support for full-text search / hybrid search
• Purpose built vector db such as Milvus has mature full-text search / BM25 / Sparse vector support.
[+] [-] tjwebbnorfolk|4 months ago|reply
Speaking of "production" -- in what world is "10+ GB" a lot of RAM for a database server?
I have to agree: the author should definitely not use Postgres or pgvector in production...
[+] [-] pqdbr|4 months ago|reply
[+] [-] muzani|4 months ago|reply
Yup, I think this here explains the popularity of pgvector. If $64/month seems like a lot to you, just use pgvector. If it seems cheap, then your usage is complex enough to want a proper vector DB.