top | item 43989121

(no title)

nxa | 9 months ago

Thank you! I actually had a hard time finding prior work on this, so I appreciate the references.

The dictionary is based on https://wordnet.princeton.edu/, no word2vec. It's just a plain lookup among precomputed embeddings (with mxbai-embed-large). And yes, I'm excluding words that are present in the query because.

It would be interesting to see how other models perform. I tried one (forgot the name) that was focused on coding, and it didn't perform nearly as well (in terms of human joy from the results).

discuss

kaycebasques|9 months ago

(Question for anyone) how could I go about replicating this with Gemini Embedding? Generate and store an embedding for every word in the dictionary?

nxa|9 months ago

Yes, that's pretty much what it is. Watch out for homographs.