top | item 41386149

(no title)

CuriousJ | 1 year ago

OP's cofounder here. For us, OpenAI embeddings worked best. When building a system that has many points of failure, I like to start with the highest quality ones (even if they're expensive / lack privacy) just to get an upper threshold of how good the system can be. Then start replacing pieces one by one and measure how much I'm losing in quality.

P.S. I worked on BERT at Google and have PTSD from how much we tried to make it work for retrieval, and it never really did well. Don't have much experience with BGE though.

discuss

peterldowns|1 year ago

Understood, thanks for the clear answer. Very cool that you worked on BERT at Google — thank you (and your team) for all of the open source releasing and publishing you've done over the years.

I'm using OpenAI embeddings right now in my own project and I'm asking because I'd like to evaluate other embedding models that I can run in/adjacent-to my backend server, so that I don't have to wait 200ms to embed the user's search phrase/query. I'm very impressed by your project and I thought I might save myself some trouble if you had done some clear evals and decided OpenAI is far-and-away better :)

xrd|1 year ago

I wish you could tell the stories of how you eval'ed BERT at Google. Sounds meaty.

CuriousJ|1 year ago

Retrieval is rarely ever evaluated in isolation. Academics would indirectly evaluate it by how much it improved question answering. The really cool thing at Google is that there were so many products and use cases (beyond the academic QA benchmarks) that would indirectly tell you if retrieval is useful. Much harder to do for smaller companies with a smaller suite of products and user bases.