top | item 45648705

(no title)

pamelafox | 4 months ago

At Microsoft, that's all baked into Azure AI Search - hybrid search does BM25, vector search, and re-ranking, just with setting booleans to true. It also has a new Agentic retrieval feature that does the query rewriting and parallel search execution.

Disclosure: I work at MS and help maintain our most popular open-source RAG template, so I follow the best practices closely: https://github.com/Azure-Samples/azure-search-openai-demo/

So few developers realize that you need more than just vector search, so I still spend many of my talks emphasizing the FULL retrieval stack for RAG. It's also possible to do it on top of other DBs like Postgres, but takes more effort.

discuss

order

jankovicsandras|4 months ago

"It's also possible to do it on top of other DBs like Postgres, but takes more effort."

Shameless plug: plpgsql_bm25: BM25 search implemented in PL/pgSQL (The Unlicense / PUBLIC DOMAIN)

https://github.com/jankovicsandras/plpgsql_bm25

There's an example Postgres_hybrid_search_RRF.ipynb in the repo which shows hybrid search with Reciprocal Rank Fusion ( plpgsql_bm25 + pgvector ).

cipherself|4 months ago

I am working on search but rather for text-to-image retrieval, nevertheless, I am curious if by that's all baked into Azure AI search you also meant synthetic query generation from the grandparent comment. If so, what's your latency for this? And do you extract structured data from the query? If so, do you use LLMs for that?

Moreover I am curious why you guys use bm25 over SPLADE?

pamelafox|4 months ago

Yes, AI Search has a new agentic retrieval feature that includes synthetic query generation: https://techcommunity.microsoft.com/blog/azure-ai-foundry-bl... You can customize the model used and the max # of queries to generate, so latency depends on those factors, plus the length of the conversation history passed in. The model is usually gpt-4o or gpt-4.1 or the -mini of those, so it's the standard latency for those. A more recent version of that feature also uses the LLM to dynamically decide which of several indices to query, and executes the searches in parallel.

That query generation approach does not extract structured data. I do maintain another RAG template for PostgreSQL that uses function calling to turn the query into a structured query, such that I can construct SQL filters dynamically. Docs here: https://github.com/Azure-Samples/rag-postgres-openai-python/...

I'll ask the search about SPLADE, not sure.

alansaber|4 months ago

That is concerning given that pure vector search is terrible outside of abstractions

pamelafox|4 months ago

I know :( But I think vector DBs and vector search got so hyped that people thought you could switch entirely over to them. Lots of APIs and frameworks also used "vector store" as the shorthand for "retrieval data source", which didn't help.

That's why I write blog posts like https://blog.pamelafox.org/2024/06/vector-search-is-not-enou...

catmanjan|4 months ago

I'd love to work with Azure search but because copilot with external items has been made so cheap it's hard to justify...

osigurdson|4 months ago

Are you using Elasticsearch behind the scenes?

pamelafox|4 months ago

I believe that Azure AI Search currently uses lucene for BM25, hnswlib for vector search, and the Bing re-ranking model for semantic ranking. (So, no, it does not, though features are similar)