top | item 42315364

Pinecone integrates AI inferencing with vector database

24 points| jimminyx | 1 year ago |blocksandfiles.com

18 comments

order
[+] tejaskumar_|1 year ago|reply
This title was a little misleading to me IMO because (maybe my skill issue) I associated "inferencing" with "generation".

After reading the article, it seems Pinecone just now supports in-DB vectorization, a feature that is shared by:

- DataStax Astra DB: https://www.datastax.com/blog/simplifying-vector-embedding-g... (since May 2024)

- Weaviate: https://weaviate.io/blog/introducing-weaviate-embeddings (as of yesterday)

[+] jeadie|1 year ago|reply
This is a common feature now. If anything, for being so early to vector databases, Pinecone was rather late to integrating embeddings.

Timescale most recently added it but, yes a bunch of others: Weaviate, Spice AI, Marqo, etc.

[+] bobismyuncle|1 year ago|reply
Astra DB seems to just be a tutorial showing how to generate embeddings using another service.

Weaviate seems to have added a similar capability — kind of wild that they announced on the same day.

Looks like Pinecone also includes reranking as part of the same process — did Weaviate add that as well?

[+] bobismyuncle|1 year ago|reply
This post has some more technical info: https://www.pinecone.io/blog/integrated-inference/

Makes a lot of sense to me to combine embedding, retrieval and reranking — I can imagine this being a way that they can differentiate themselves from the popular databases that have added support for vector search

[+] kingkongjaffa|1 year ago|reply
Can someone please explain how this works?

I assumed that a specific flavour of LLM was needed, an “embedding model” to generate the vectors. Is this announcement that pinecone is adding their own?

Is it better or worse than the models here: https://ollama.com/search?c=embedding For example?

[+] llm_nerd|1 year ago|reply
Normally you take your content and run it through an embedding model, inserting the resulting vectors into the vector DB. On a query, for instance, you run the query through the embedding model and query the vector database for the most similar hits to the resulting embedding vector. Similarly reranking is when get get the broad hits from the embedding similarity search and/or BM25, and then a reranker uses the looked up source material to rank the results more finely.

This is building it into the vector DB such that you send it the content and it is "built in".

Seems silly. It's like bundling a stove with cookware. But cookware fit specific niches and have different life cycles. I get that it might cater to some "drop in solution" targets, but seems of no value for most engineered, long-term solutions.

[+] tejaskumar_|1 year ago|reply
There's more technical detail here: https://www.pinecone.io/blog/integrated-inference/

> Is this announcement that pinecone is adding their own?

TLDR: they trained their own embeddings model and rely on Cohere for ranking. Pinecone (the database) uses this model automatically to generate and store embeddings.

> I assumed that a specific flavour of LLM was needed, an “embedding model” to generate the vectors.

You're mostly right, with one caveat: embeddings models aren't really LLMs in that they're not very large: they just map semantic meaning to numerical space.

> Is it better or worse than the models here: https://ollama.com/search?c=embedding For example?

This is the golden question. As far as I know, there is no appropriate benchmarking/eval data about this. I think the real value is the first-class integration between their model and their service.

[+] tech2trees|1 year ago|reply
Nothing new, Marqo has been doing this for a while now with their all in one platform to train, embed, retrieve, and evaluate.

I've played around with Weaviate & Astra DB but Marqo is the best and easiest solution imo.

[+] dmezzetti|1 year ago|reply
txtai (https://github.com/neuml/txtai) has had inline vectorization since 2020. It supports Transformers, llama.cpp and LLM API services. It also has inline integration with LLM models and a built-in RAG pipeline.