top | item 44042567

(no title)

snowstormsun | 9 months ago

Nice idea, but this approach does not handle out of vocabulary words well which is one major motivation for using a vector-based search. It might not perform significantly better compared to lexical matching like tf-idf or BM25, and being slower because of linear complexity. But cool regardless.

discuss

order

netdevphoenix|9 months ago

It is supposed to be a simple search engine. Keyword: simple.

As long as it does what it is meant to, as a simple search engine, it seems fine

snowstormsun|9 months ago

Using tfidf or bm25 would actually be simpler than a vector search.

I understand this is just for fun, just wanted to point that out.

cosmicgadget|9 months ago

Or since OP has both the cosine similarity matching and naive matching, a heuristic combination of the two since they address each other's weaknesses.

janalsncm|9 months ago

Vector based approaches either don’t handle OOV terms at all or will perform poorly, depending on implementation. If you limit to alphanumeric trigrams for example you can technically cover all terms but badly depending on training data.

haasisnoah|9 months ago

How would you handle those in wordvec?

And isn’t a big advantage that synonyms are handled correctly. This implementation still has that advantage.