Extending that idea to the web, or at least to the blogosphere and information / knowledge web-sites, seems useful.
I wonder if there is a web service which has calculated vector embeddings for some of the web, and supports vector search, e.g. given a URL, find URLs with similar embeddings.
Inverting that, web-sites could annotate their web pages with embeddings via json-ld; which search engines could utilise. Both these ideas might be impractical, e.g. the cost of http GET of the vector might be similar to the cost of calculating the embedding; and the embedding would be only comparable with embeddings from the same model (which would be recorded in the json-ld) so it would age quickly. It would also be subject to SEO gaming, like meta tags.A quick search didn't find either of these; the closest was this paper which used json-ld to record a vector reduced to 2 dimensions using tSNE :
https://hajirajabeen.github.io/publications/Metadata_for_Eme...
Metadata standards for the FAIR sharing of vector embeddings in Biomedicine
S¸ enay Kafkas et al.
tomhazledine|2 years ago
Once you start increasing the granularity of what you're embedding (either by paragraph or sentence) then the old-fashioned search index has a big advantage.
Might be worth it in some scenarios because of the quality of the results. I bet there are places where an embedding search would be more effective by orders of magnitude.
jasonjmcghee|2 years ago
Took a crack at building “vector-store-in-a-cdn” last weekend.
https://github.com/jasonjmcghee/portable-hnsw
flir|2 years ago
Would be easy to mark a vector-providing site as a bad actor, though? Re-run a few of their pages, if you come up with different answers, don't trust them.