I found that stemming the text before generating vectors helps increase recall and the vectors still capture context, etc. However it does hurt precision because some information is lost by stemming. The more recent vector training algorithms are better able to capture semantic, syntactic, and contextual similarity without a lot of preprocessing. So I have found that vectors can replace all the nonsense that used to be needed to increase recall: stemming, manual synonym lists, etc.However vector similarity search only helps with the literal text search not ranking. Tf/idf, bm25, page rank, learn to rank ML, etc are still needed to rank documents. Whenever I find a new vector search engine, I always look to see what ranking features it has beyond vector similarity.
bryanrasmussen|2 years ago
In my experience this is more useful in complicated document searches.