(no title)
qw | 10 months ago
The initial prefix check would probably reduce the amount of lookups necessary, as it would only be necessary to do a deeper search if the prefix matches.
qw | 10 months ago
The initial prefix check would probably reduce the amount of lookups necessary, as it would only be necessary to do a deeper search if the prefix matches.
Thorrez|10 months ago
There are 5 billion emails in at least 1 breach and 16 million prefixes. Almost all if not all prefixes have at least 1 email in a breach. So almost all prefixes match. I don't see why it's useful to spend a bunch of effort optimizing the very rare case of a prefix not matching.
Now, if the bloom filter checked emails instead of checking prefixes, that would be useful. However, a bloom filter of 5B elements with a 10% false positive rate would be 2.8 GB, which is prohibitively large.
https://hur.st/bloomfilter/?n=5g&p=10&m=&k=
smallpipe|10 months ago
Thorrez|10 months ago
Also regarding "you can get rid of a significant portion of requests at the edge with a bloom filter", Troy's existing design already gets rid of a significant portion of requests at the edge. That's why he says
>The response from each search was coming back so quickly that the user wasn’t sure if it was legitimately checking subsequent addresses they entered or if there was a glitch.