top | item 40072733

(no title)

suprgeek | 1 year ago

Great project and excellent initiative to learn about embeddings. Two possible avenues to explore more. Your system backend could be thought of as being composed of two parts: |Icons->Embedder->|PGVector|->Retriever->Display Result|

1. In the embedder part trying out different embedding models and/or vector dimensions to explore if the Recall@K & Precision@K for your data set (icons) improves. Models make a surprising amount of difference to the quality of the results. Try the MTEB Leaderboard for ideas on which models to explore.

2. In the Information Retriever part you can try a couple of approaches: a.after you retrieve from PGVector see if you can use a reranker like Cohere to get better results https://cohere.com/blog/rerank

b.You could try a "fusion ranking" similar to the one you do but structured such that 50% of the weight is for a plain old keyword search in the metadata and 50% is for the embedding based search

Finally something more interesting to noodle on - what if the embeddings were based on the icon images and the model knew how to search for a textual descriptions in the latent space?

discuss

No comments yet.