top | item 41765763

(no title)

gorhill | 1 year ago

No competent content blocker tests "ten thousand regexp matches" for each request URL to match, this is not how it works.

To simplify, and speaking from uBO's perspective, consider that nine distinct tokens can be extracted from the URL in the address bar for the current webpage:

  https
  news
  ycombinator
  com
  reply
  id
  41758007
  goto
  item%3Fid%3D41757178%2341758007

To match such URL against the tens of thousand of filters, there is only a need to lookup filters for these nine tokens, and for most of these tokens there won't be any filters to test, such that in the end for any given URL only a few to no filters will end up being tested, and the majority of these filters are not regex-based, they are just plain string matching.

This is the overall simplified explanation of how it really works, in reality it's a bit more complex because there are a lot of other optimizations on top of this.

There is a built-in benchmark tool in uBO, accessible through the dashboard, _Support_ pane, _More_ button, _SNFE: Benchmark_ button[1].

When running the benchmark against a set of 230,364 URLs, I get an average of 11-12 µs per request to perform a match test against the default filter lists in uBO.

* * *

[1] https://github.com/gorhill/uBlock/wiki/Advanced-settings#ben...

discuss

No comments yet.