top | item 32197988

(no title)

lorenzhs | 3 years ago

If you enjoyed XOR filters, you might also like ribbon filters, something that I had the pleasure of working on last year. They share the basic idea of using a system of linear equations, but instead of considering 3 random positions per key, the positions to probe are narrowly concentrated along a ribbon with a typical width of 64. This makes them far more cache-efficient to construct and query.

By purposefully overloading the data structure by a few per cent and bumping those items that cannot be inserted as a result of this overloading to the next layer (making this a recursive data structure), we can achieve almost arbitrarily small space overheads: <1% is no problem for the fast configurations, and <0.1% overhead with around 50% extra runtime cost. This compares to around 10% for XOR filters and ≥ 44% for Bloom filters.

In fact, I'm going to present them at a conference on Monday - the paper is already out: https://drops.dagstuhl.de/opus/volltexte/2022/16538/pdf/LIPI... and the implementation is at https://github.com/lorenzhs/BuRR/. I hope this isn't too much self-promotion for HN, but I'm super hyped about this :)

discuss

loxias|3 years ago

This made my day, thank you so much. :) Your explanation makes sense and it sounds brilliant/clever yet obvious. (like many good ideas are)

I'm reading the paper and looking at your github now, and look forward to "github/academia" stalking you in the future. Skimming your list of repositories and seeing a lot of stuff I understand and could possibly use. ;-)

(I find it to be a useful strategy to, when I find a clever concept in a paper, or in code on github, then look at all the other things done by the same person, or the works of people they co-author with. "collaborative filtering" for ideas.)