I myself have been working on a personal search engine for sometime, and one problem i faced was to have an effective fuzzy-search for all the diverse filenames/directories. All approaches i could find were based on Levenshtein distance , which would have led to storing of original strings/text content in the index, and neither would be practical for larger strings' comparison nor would be generic enough to handle all knowledge domains.
This led me to start looking at (Local sensitive hashes) LSH approaches to measure difference b/w any two strings in constant time. After some work i finally managed to complete an experimental fuzzy search engine (keyword search is a just a special case!).In my analysis of 1 Million hacker news stories, it worked much better than algolia search while running on a single core !
More details are provided in this post: https://eagledot.xyz/malhar.md.html . I tried to submit it here to gather more feedback but didn't work i guess!
iudqnolq|1 year ago
That often involves recording original trigram position, but I think that's necessary to weigh "I like happy cats" higher than "I like happy dogs but I don't like cats" in a search for "happy cats".
warangal|1 year ago