top | item 40497957

AdFlush

276 points| grac3 | 1 year ago |dl.acm.org

108 comments

order

pradn|1 year ago

What's fascinating here is AdFlush is a classical feature engineering approach: define a bunch of features on the data manually, and then use ML to figure out the most useful / impactful ones. This is not the "throw terabytes of data and see what happens" approach we see with LLMs. It's a bit funny to even point this out because I don't recall the last time a feature-engineered ML project made it to the HN front page.

Features can be brittle, but they are understandable. The paper's appendix [1] lists the 27 features that will likely make a request/resource "ad-related". These include interesting ones like JS AST depth, average JS identifier length, the "bracket to dot notations ration in JS", and a number of graph measures for the graph of scripts.

And contrary to what comments in this thread are saying, they do compare against a blocklist-based adblocker: uBlock Origin. That's in section 5.5. They say they outperform uBlock Origin. But even they say they don't reduce overall page time bc their algorithm is expensive.

[1]: https://dl.acm.org/doi/pdf/10.1145/3589334.3645698

tofof|1 year ago

More specifically, page load time was 2.7 seconds without adblocker, decreased to 2.1 with uBlock Origin, but increased by 250% to 6.6 seconds with AdFlush, or increased to 3.4 seconds with AdFlush retaining prior predictions.

The superior score was an F1 of 0.86 vs 0.84 for AdFlush vs uBlock Origin, and it's not clear to me that this is a statistically significant difference. They do not claim it is.

andirk|1 year ago

I like the strategy of using flags to say "look into this suspicious part of the code" over a hardcoded block list. And also block shitty JS via "JS AST depth, average JS identifier length" etc even if it's not an ad but just bad code.

For Brave browser users, you can see what hardcoded lists you're using at brave://adblock .

As for the whole cat and mouse game, how to detect an "ad" if it's served with the content fully sever-side? Now _that_ needs some serious ML to decipher.

nomilk|1 year ago

AdFlush (F1 Score: 0.98) seems to do better than some other adblockers: AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84), but it begs the question: why not compare to the most popular adblockers: uBlock Origin, Adblock Plus etc.

I think the authors want to compare apples with apples, so they only compare their algorithm to other adblockers that use algorithms, as opposed to those which use crowdsourced lists. The paper somewhat acknowledges this:

> However, manual maintenance of these filter lists requires significant human effort

Seems like one of those tasks where crowdsourcing scales so nicely (only one person has to report an ad for it to go into a crowdsourced list that blocks it for millions of others) that it makes an algorithmic approach unnecessary.

Cthulhu_|1 year ago

The filter based adblockers are at risk though, with Google's new extension thingy that - at least a few years ago, I haven't heard from it since - limited the amount of rules. If there's a non-rule based system that is 98% effective then that would circumvent the arbitrary rule limits that Google set.

RamRodification|1 year ago

> only one person has to report an ad for it to go into a crowdsourced list that blocks it for millions of others

Is it that easy? Sounds very abusable

_al_|1 year ago

there is an entire section in the paper sub-titled: Comparison with uBlock Origin..

1oooqooq|1 year ago

practical solutions don't get you published

YmiYugy|1 year ago

Without comparison to the accuracy of crowed sourced blocklists it's not that valuable. Maybe there is a group of hopelessly overworked blocklist maintainers/contributors, that I'm not aware of. If so, their cries for help don't seem to make the HN front page. From a user perspective, blocking banner ads feels like a basically solved problem. I think the real pain point here is that for large chunks of the web, there is no distinction between ads and content.

JAlexoid|1 year ago

There will never be a solution to native ads. It's part of the content you choose to consume, that someone produced.

The only way to avoid native ads is to stop consuming content that relies on ads.

3abiton|1 year ago

> We tested AdFlush on a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, thereby outperforming AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84). Additionally, AdFlush significantly reduces computational overhead, requiring 56% less CPU and 80% less memory than AdGraph. We also assessed AdFlush's robustness against adversarial manipulations, demonstrating superior resilience with F1 scores ranging from 0.89 to 0.98

Neat results, I wonder how it compares to uBO or the different blacklists. I assume it self-update with newer techniques and can detect certain patterns?

Mkengine|1 year ago

You can find the comparison to uBO under 5.5

dale_glass|1 year ago

The future is here.

If I recall, in Permutation City there's some part where somebody deals with spam with AI. The user tries to use a simulation to listen to potential spam to filter it, while the spam tries to figure out whether a real person is listening to it and only tries to spam when a real person is there.

Or something along those lines, it's been a long time since I read it.

karaterobot|1 year ago

Blocking image ads seems like a relatively well-solved problem. I mean, speaking as someone who can't stand ads, I don't see very many of them anymore when I'm on desktop.

The harder, more pernicious type of ads are the modals that pop up when your cursor moves toward the back button, or when you scroll down a certain distance on the page. "Wait! Before you go, take a moment to give us your email address!"

Those can be blocked, but by the time you've seen them, they've already done all the damage they can do—which is to say, they've annoyed you.

I wish somebody could come up with a way to detect and stop them. I spent an afternoon trying to come up with reusable techniques to detect these popups, but there are just too many possibilities.

Night_Thastus|1 year ago

Always a joy to see efforts in the ongoing battle against advertisements.

There are few things I feel radical about, and Ads are one of them. I believe they are a drain in several ways:

They waste computational resources and electricity on both ends. They compromise the visual design and layout of webpages. They distract and take mental energy away from the user. They make the internet (and anywhere ads exist) more "ugly" and less aesthetically pleasing - which negatively impacts mental health. They often sell low-quality services/products or outright scams, which harms those least educated and poorest individuals.

Death to advertisement! On billboards! On television! On the internet!

Ads are a parasite on the human mind that need to go away, forever.

Terr_|1 year ago

Ultimately it's about where we draw the line for hacking other people's brains.

It's a spectrum: Some level is an unavoidable part of communication ("I like dogs" forces you to think of dogs) some more is considered normal and traditional manipulation ("My food smells nice, that makes you hungry, wanna buy?") and then it goes on into grey-areas, scams, and eventually to potential extremes like "this image induces nausea" or "this sound knocks you out".

btbuildem|1 year ago

They are a scourge and a tell-tale sign that we've grown far beyond excess and into absurd territory where more effort is spent on bending our minds to consume a thing that it took to make the thing in the first place.

CatWChainsaw|1 year ago

Careful, apparently not wanting your mind polluted with psychological manipulation makes you a filthy communist..

p3rls|1 year ago

Death to small media companies! You should have gotten some VC money if you wanted to make products for people, you poor pieces of shit.

tjpnz|1 year ago

I use a combination of UBO, PiHole and AdGuard on my mobile devices. Can't say I've seen an ad in the last year. Is this trying to solve an existing problem or speculating on where things could go in future?

rgrmrts|1 year ago

I’m curious why you’re using 3 separate methods. Do you miss things with just one? AFAIK all 3 use similar block lists and are configurable.

I’m building a pi-hole type solution for myself and essentially want all the filtering and blocking to happen at my firewall and not on my client (phone, laptop, tablet).

infogulch|1 year ago

So AdFlush beats uBlock Origin with a marginal detection rate advantage of 0.86 vs 0.84, at the cost of significant performance overhead: median 2.7s load time (no ad block); 2.2s (uBO); 6.6s (AdFlush clean); 3.4s (AdFlush cached).

I'd like to see a tandem uBO+AdFlush extension that just enables uBO by default, with a "I still see ADs!" button in the extension UI that refreshes with AdFlush enabled and auto-submits any missed ads to a new FlushList filter list.

jarbus|1 year ago

I didn't realize this was an active area of research, love this.

cimnine|1 year ago

So, this begs the question when we'll see ML put in place to avoid AdBlocker detection. Or ads as we know them just disappear from the web and are replaced with other kinds of ML-enabled ads. I imagine deep-fake models used for interchangeable product placement in videos or pictures or so.

h4kor|1 year ago

How does this compare to list based solutions? An overblocking/underblocking comparison would be great

gastonmorixe|1 year ago

Nice! I’d love to know if AI-Ad / tracking / telemetry / etc blocking could be improved for MITM network layer filtering not just the browser.

rpastuszak|1 year ago

Oh boy, that didn't take long. Just last year I made Butter https://butter.sonnet.io as an excuse to talk about this:

> This project is a half-serious, half-assed attempt to demonstrate that in the next few years the process of blocking this type of content could be almost entirely automated. Yes, it would be wasteful from a computational and human potential perspective, and otherwise completely unnecessary, but hey, more money would change hands!

Havoc|1 year ago

How realtime is this? Or well enough to not be noticeable while browsing

mrbluecoat|1 year ago

I'd be okay with a hybrid approach: lists for real-time blocking and machine learning for passive analysis to augment the lists over time.

flakiness|1 year ago

This can be a Copilot+PC's killer feature :-)

seized|1 year ago

> We tested AdFlush on a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, thereby outperforming AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84).

... Has anyone even heard of these ad blockers before?

flakiness|1 year ago

These are all academic research projects.