(no title)
avthar | 1 year ago
You are correct in saying that that you can store embeddings and source data together in many vectordbs. We actually point this out in the post. The main point is that they are not linked but merely stored alongside each other. If one changes, the other one does not automatically change, making the relationship between the two stale.
The idea behind Pgai Vectorizer is that it actually links embeddings with underlying source data so that changes in source data are automatically reflected in embeddings. This is a better abstraction and it removes the burden of the engineer to ensure embeddings are in sync as their data changes.
jeffchuber|1 year ago
spmurrayzzz|1 year ago
cevian|1 year ago
In addition it seems that embeddings happen at ingest time. So, if, for example, the OpenAI endpoint is down the insert will fail. That, in turn means your users need to use a retry mechanism and a queuing system. All the complexity we describe in our blog.
Obviously, I am not an expert in Chroma. So apologies in advance if I got anything wrong. Just trying to get to the heart of the differences between the two systems.