top | item 225632

We relaunched similarity-search based on Y Combinator feedback. Thoughts?

4 points| eserorg | 17 years ago |eser.org

5 comments

order
[+] eserorg|17 years ago|reply
We received a lot of valuable feedback on our similarity-search engine, which we launched a few days ago.

Based on your feedback, we've made some major changes to improve recall. Specifically, we've begun to include data from our web-crawler.

We've also started to prune many of the similarity-search results in order to improve precision.

Finally, we cleaned-up the UI to make it more clear what the website does. I think that we still have some work to do in this area, however.

Unfortunately, many of the changes we've made to the algorithm have _dramatically_ slowed down performance. Most searches now take over a minute to complete!

We're hard at work on fixing that, though. Specifically, we're playing around with implementing multi-level counting bloom filters, count-min flajolet-martin sketches, and quntile fm digests.

We should have some major performance improvements up over the next few days.

We're also looking at launching a pre-alpha of a stand-alone software package that implements the ESer algorithm so that people can run similarity-searches on their own private data sets.

Please comment with your feedback.

Thanks again!

[+] eserorg|17 years ago|reply
BTW, I want to apologize for how slow searches are running! We're going to work on this until it's fixed.

It should probably take about a week to see an improvement -- we don't want to sacrifice recall or precision to improve performance.

So, we're going to try to have our cake and eat it, too.

[+] dkasper|17 years ago|reply
It starts out pretty good, but gets kind of bizarre by the last 20 or so results. Pretty cool idea though!
[+] eserorg|17 years ago|reply
Right, it's a contextual ranking algorithm -- it ranks links based on how "similar" they are to what you typed in.

So, the further down the list you go, the lower the rank is -- the less "similar" the links get.

We actually compute a numerical "similarity score" for each link. Perhaps we should show it?

Thanks for trying it out!

[+] jakewolf|17 years ago|reply
Reminds me of adwords keyword tool. Fun. Good luck.