top | item 39614637

(no title)

patresh | 2 years ago

The high level API seems very smooth to quickly iterate on testing RAGs. It seems great for prototyping, however I have doubts whether it's a good idea to hide the LLM calling logic in a DB extension.

Error handling when you get rate limited, the token has expired or the token length is too long would be problematic, and from a security point of view it requires your DB to directly call OpenAI which can also be risky.

Personally I haven't used that many Postgres extensions, so perhaps these risks are mitigated somehow that I don't know?

discuss

chuckhend|2 years ago

We are working on a 'self-hosted' alternative to OpenAI. The project already has that for the embeddings. i.e. you specify an open-source model from hugging face/sentence-transformers, then API calls get routed to that service that you're self hosting in a container next to Postgres. This is how the docker-compose example in the project readme is set up. We'll be doing the same pattern but for chat completion models.

On Tembo cloud, we deploy this as part of the VectorDB and RAG Stacks. So you get a dedicated Postgres instance, and a container next to Postgres that hosts the text-to-embeddings transformers. The API calls/data never leave your namespace.

infecto|2 years ago

I would agree with you. Similar to a Langchain in my mind, some interesting ideas but a lot more powerful to implement on your own. I would much rather use pgvectors directly.