(no title)
shri_krishna | 2 years ago
Now imagine you are using that database for storing transactions and other day to day business ops that will still be storing millions of records but with small indexes. This would have ideally only required a single DB instance with a replica for redundancy. Now if you integrate Vectors into the equation, you will have to needlessly scale this DB both horizontally and vertically just to maintain a decent query/write performance to your DB (which would have ideally been extremely fast without embeddings in the mix). You will eventually separate the embeddings out as it makes no sense for the entire DB to be scaled just for the sake of scaling your embeddings. I am not even accounting for index generation for these vectors which will require nearly 100% of all CPU cores while the index is being generated (depending on type of ANN you are using) and which in turn would slow your DB to a crawl.
beoberha|2 years ago
Someone makes the example in another comment, but it’s analogous to OLTP vs OLAP
Foobar8568|2 years ago
totetsu|2 years ago
shri_krishna|2 years ago
Clustering, load balancing, aggregating queries etc are quite different for a vector database in comparison to traditional OLTP databases.
It's the same as difference between OLAP vs OLTP. Both have different underlying architectural differences which make it incompatible for both to run in an integrated fashion.
For instance, in a traditional DB the index is maintained and rebuilt alongside data storage and for scaling you can separate it into read/write nodes. The write nodes typically only focus on building indexes while the read nodes for querying eventually consistent indexes (eventual consistency is achieved by broadcasting only the changed rows rather than sending entire index).
Now it's similar in vector dbs too. You can seperate the indexer from query nodes (which access eventually consistent index). However, the load is way higher than a regular db as the index is humongous/takes a long time to build and sharing the index with query nodes is also more time consuming and resource/network intensive, as you won't be sharing few rows but the entire index itself. It requires a totally different strategy to get all query nodes to be eventually consistent.
The only advantage of traditional DBs also implementing vector extensions is familiarity for the end user. If you are already familiar with postgres you wouldn't want to leave your comfort zone. However, scaling a traditional DB is different from scaling a vector DB and you'll encounter those pain points only in production and will be forced to switch to proper vector databases anyways.
samlambert|2 years ago
redwood|2 years ago