top | item 38205044

(no title)

Thanks for the insight!

A use case could be that anyone can provide a piece of content with the accompanying embedding and this can be used for semantic search. I.e. the search engine does not have to compute the embedding of everything, just the query.

discuss

seanhunter|2 years ago

The problem with this line of thinking is that the specific embedding chosen has a big impact on task performance[1], and the person producing the piece of content doesn't necessarily know what you're going to be wanting to use that content for a lot of the time. The second thing is that certain embedding formats are very specific to particular model architectures.

From a practical perspective there are some standard embedding formats that people use a lot so if you're performing a normal sort of task there's probably a standard format for that particular task (eg worth checking out spacy's embeddings library which works with a lot of different libraries[2])

[1] For example see this paper for a comparison of performance for different code embeddings https://arxiv.org/pdf/2109.07173.pdf

[2] https://spacy.io/usage/embeddings-transformers/