top | item 37313631

(no title)

mpercy | 2 years ago

How are people actually using vector databases?

The closest explanation to a use case architecture I've seen recently was https://mattboegner.com/knowledge-retrieval-architecture-for... - it basically describes doing knowledge retrieval (keyword parsing) from LLM queries, feeding that to a vector db to do similarity search to get a top K similar documents to the parsed keywords, then feeding that list that back into the LLM as potential useful documents it can reference in its response. It's neat but it seems a bit hacky. Is that really the killer app for these things?

discuss

CharlieDigital|2 years ago

We used it in an e-commerce application.

Apparently, one of the hardest things to do is to match a product name + description to a product taxonomy.

There are multiple taxonomies. Here's Google's for example: https://www.google.com/basepages/producttype/taxonomy.en-US....

Amazon has their own. Walmart has their own. Target has their own.

Given a list of tens of thousands of products, how can you automatically match the product to a merchant's taxonomy?

I started with a "clever" SQL query to do this, but it turns out that it's way easier to use vector DBs to do this.

    1. Get the vector embedding for each taxonomy path and store this 
    2. Get the vector embedding for a given product using the name and a short description
    3. Find the closest matching taxonomy path using vector similarity

It's astonishingly good at doing this and solved a big problem for us which was building a unified taxonomy from the various merchant taxonomies.

You can use the same technique to match products with high confidence across merchants by storing the second vector embedding. Now you have a way to determine that product A on Target.com is the same as product A' on Walmart.com is the same as product A'' on Amazon.com by comparing vector similarity.

castlecrasher2|2 years ago

Could this strategy work to match products across retailers? If so, any tips on getting started with vector databases? I've heard of them but have yet to try one out.

ultra_nick|2 years ago

Yes, GPU poor people are just using top k semantic search to try to fix the issues will low ram low knowledge LLMs. It's OK for some applications, but other methods need to be investigated.