We received a lot of valuable feedback on our similarity-search engine, which we launched a few days ago.
Based on your feedback, we've made some major changes to improve recall. Specifically, we've begun to include data from our web-crawler.
We've also started to prune many of the similarity-search results in order to improve precision.
Finally, we cleaned-up the UI to make it more clear what the website does. I think that we still have some work to do in this area, however.
Unfortunately, many of the changes we've made to the algorithm have _dramatically_ slowed down performance. Most searches now take over a minute to complete!
We're hard at work on fixing that, though. Specifically, we're playing around with implementing multi-level counting bloom filters, count-min flajolet-martin sketches, and quntile fm digests.
We should have some major performance improvements up over the next few days.
We're also looking at launching a pre-alpha of a stand-alone software package that implements the ESer algorithm so that people can run similarity-searches on their own private data sets.
[+] [-] eserorg|17 years ago|reply
Based on your feedback, we've made some major changes to improve recall. Specifically, we've begun to include data from our web-crawler.
We've also started to prune many of the similarity-search results in order to improve precision.
Finally, we cleaned-up the UI to make it more clear what the website does. I think that we still have some work to do in this area, however.
Unfortunately, many of the changes we've made to the algorithm have _dramatically_ slowed down performance. Most searches now take over a minute to complete!
We're hard at work on fixing that, though. Specifically, we're playing around with implementing multi-level counting bloom filters, count-min flajolet-martin sketches, and quntile fm digests.
We should have some major performance improvements up over the next few days.
We're also looking at launching a pre-alpha of a stand-alone software package that implements the ESer algorithm so that people can run similarity-searches on their own private data sets.
Please comment with your feedback.
Thanks again!
[+] [-] eserorg|17 years ago|reply
It should probably take about a week to see an improvement -- we don't want to sacrifice recall or precision to improve performance.
So, we're going to try to have our cake and eat it, too.
[+] [-] dkasper|17 years ago|reply
[+] [-] eserorg|17 years ago|reply
So, the further down the list you go, the lower the rank is -- the less "similar" the links get.
We actually compute a numerical "similarity score" for each link. Perhaps we should show it?
Thanks for trying it out!
[+] [-] jakewolf|17 years ago|reply