(no title)
aiappreciator | 2 years ago
But a question for true DB experts here:
1. Is there any real advantage to building a dedicated vector DB from scratch?
2. Is vector DB something that can be just 'tacked on' to a normal DB with no major performance penalties?
We know from history, that data warehouses are genuinely different from databases, and cloud data warehouses are overwhelmingly superior to on-prem ones. So that emerged as a distinct, enduring category with Snowflake/Databricks/Bigquery.
jamesblonde|2 years ago
Most vector databases use one of a few different vector indexing libraries - FAISS, hnswlib, and scann (google only) are popular. The newer vector dbs, like weaviate, have introduced their own indexes, but i haven't seen any performance difference -
Reference: https://ann-benchmarks.com/
andris9|2 years ago
redwood|2 years ago
charcircuit|2 years ago
Some advantages of having a separate index is that it can work with different backends, it can be independently scaled, and it can index data for more than 1 database server.
Some disadvantages are increased latency, increased complexity, and distributed system problems.
hobs|2 years ago
The operational pains if you need to self host this stuff are real, split brain, backup/restore not really considered (compared to a normal databases features), things like replication and sharding _exist_ but often are a buggy mess.
OLAP is definitely distinct from OLTP, and most of these vector queries have some aspect of both - they are similar to OLAP in that they need a decent amount of preprocessing to be useful (inferrence) and they are similar to OLTP in that they are often used for serving point queries or tiny lookups.