top | item 21582698

uBlock Origin: Address first-party tracker blocking

996 points| hokkos | 6 years ago |github.com

581 comments

[+] newscracker|6 years ago|reply

So Chrome doesn’t support the API that uBlock Origin could use to block such ads whereas Firefox does (with the user’s permission). [1] One more reason why Chrome will soon be seen as mostly useless for users of ad blocking extensions. Hope Firefox keeps alive the things that make uBlock Origin a necessity for a decent browsing experience.

[1]: https://github.com/uBlockOrigin/uBlock-issues/issues/780#iss...

[+] Santosh83|6 years ago|reply

This was always coming. Next we will have unblockable ads delivered through first party and using obfuscated techniques like canvas or webassembly. The endgame will be all/most websites embedding 1st party ads and tracking and the only way to not see them would be to not use the WWW at all. At that point we will have lost.

[+] pdkl95|6 years ago|reply

> unblockable ads delivered through first party and using obfuscated techniques like canvas or webassembly

When I said basically the same thing 4 years ago[1], most people seemed to think it wasn't a serious concern or could be easily bypassed. However, after observing how certain types of businesses use the World Wide Web over the past 3 decades, it's obvious what they want: to send an opaque binary blob to the user that nobody can investigate or modify, that gives them full control over what the user sees and is allowed to do. Just like TV.

The only safe response to this is to stop allowing documents (or docs with 3270-styole forms) to embed software in a Turing complete language. Add functionality that is used declaritively, or the answer to "should this be blocked" is undecidable. Give them the ability to run a Turing complete language that renders to a canvas, and adblocking becomes a hard image recognition problem (or requires solving the Halting Problem).

[1] https://news.ycombinator.com/item?id=10211050

[+] dspillett|6 years ago|reply

> The endgame will be all/most websites embedding 1st party ads and tracking

The problem with first party tracking from the PoV of the advertisers is that the feedback they need goes through the site their ad is on so it is possible to be faked: "Yes Mr Advertiser, we really did send x000 ad impressions to {addresses} this day, honest guv'ner."

And from the site's point of view the adverts now become a little more admin to manage beyond just slapping in a reference to 3rd party JS and adding a <div> for that code to target to insert the advert.

[+] aequitas|6 years ago|reply

That will be the time we switch from ad-blockers to content-allowers. It can already be seen with distraction free modes.

Maybe there is a market for a proxy that converts any page you visit to bare and functional HTML with just content and navigation. No ad's, distractions, disfunctional scrolling, etc.

[+] voidmain|6 years ago|reply

The next step after that will be machine vision based transformation of the rendered output. I think as long as end users can keep control of the browser and computing device hope is not lost.

There is also the option of actively attacking the business model of the ad and tracking industry. Ad blockers could simulate lots of "fake" traffic to make ad analytics harder (this is another arms race against attempts to filter out the fake traffic)

[+] have_faith|6 years ago|reply

There's no losing if people voted with their attention instead. If the terms of exchanging data are unacceptable then don't accept any data from that source. The digital equivalent of voting with your wallet.

Kind of like climate change, there are actions we all know would help but individually giving up those conveniences is difficult.

[+] cyborgx7|6 years ago|reply

I wasn't aware putting trackers on subdomains was a working adblocker workaround. Now that I know it, I'm surprised it hasn't been a massive problem since much earlier.

[+] walrus01|6 years ago|reply

Image recognition algorithms client side. Accept and load the ad, render page, replace it with a white square over it.

If you're willing to get cpu/gpu intensive on the page rendering a lot can be done.

[+] squiggleblaz|6 years ago|reply

I guess the solution at that point will be to develop a free (foss) internet alternative.

The internet will partition: the corporate internet will be what you say, and the independent internet will be people writing and hosting their own content, like the internet of the olden days. Some of them might interact with corporate services via apis.

The independent internet will be small, but it will be enough. If you want to buy something anonymously, go to a shop and pay cash with your phone turned off.

[+] CuriouslyC|6 years ago|reply

Nah. That's when we'll start serving scraped, mirrored content over a P2P network.

[+] syshum|6 years ago|reply

Thank W3C for caving into the Corporations that want to remove the Openness from the web-standards that allow for this kind of nonsense.

Google is the biggest offender after all their entire business is Tracking and Ads but it seems they get a pass as being an "offender" from most people

[+] Quarrelsome|6 years ago|reply

but this is all we ever asked, wasn't it? That the content creator take direct responsibility for their advertising instead of shipping us off someplace else.

[+] zaarn|6 years ago|reply

WebAssembly can still be detected; the filepath or subdomain can be blocked as normal (would already work) and optionally you can run a fingerprinting method similar to AVs to detect scripts similar to trackers.

[+] JohnBooty|6 years ago|reply

Yeah, literally no way to stop tracking.

Suppose I have an apache module (or the equivalent in some modern http server) that's (a) injecting the necessary code (b) forwarding the traffic information to my nefarious third-party tracking provider. Or, heck, just a third-party solution that consumes my apache logs. It's doing all of this without using the browser itself as a middleman, like the clumsy CNAME masking discussed in the linked Github discussion.

This could never be stopped client-side since one's web browser would have no say.

Only reason this isn't more widespread already is because a lot of web properties don't have full control over their shared hosting environments.

First-party ads are theoretically kinda sorta maybe blockable via various levels of heuristics, which of course adblockers are already doing to various extents today.

But as far as preventing your information from being forwarded behind the scenes, there's no technical solution. Legislation is the only hope to curb it.

[+] Blaiz0r|6 years ago|reply

We have lost the WWW since corporate interests took over from personal interest.

Small community sites are still interesting, distributed web technologies may start to rise, or there's always gophernet...

[+] JoeSamoa|6 years ago|reply

It should be clear at this point the www is a not what it was. Honestly the ads could still be 3rd party, but they would just be stored on the same web server as the site. Then the 3rd party would just collect the info from the site instead of directly through the user.

I'm ready to move on. In my eyes we already lost the www.

[+] worldsayshi|6 years ago|reply

We could start blacklisting such websites (with opt-in and toggle-able blacklists) and move to search engines that allow such blacklist opt-in features.

I'm already seeing a big use case for having such a (opt-in) domain filtered search engine since there's soo many spammy and SEO-hacking web sites out there.

[+] derefr|6 years ago|reply

Nah; presuming the page’s real content continues to just be plain HTML, the worst things will get is that you’ll have to browse some sites with JavaScript disabled.

Also, re: “inline” static images, there could totally be client-side ML-model cosmetic filters that recognize and remove known as images, regardless of how they got to the page. The filter could even just throw a floating rectangle over then, so it wouldn’t even have to understand how they’re made in the DOM. This is the “thermonuclear backup plan” we’ve been expecting to need to pull out for a while now, though advertisers have been lazy about getting sneaky enough to necessitate it.

[+] IvanK_net|6 years ago|reply

The first thing is, how do you distinguish between the ad and a non-ad? If my friend mentions a famous brand in a chat with me, should there be some entity to remove the name of that brand from the chat, so that I don't see it?

There are many videos on Youtube recommending stuff. You never know if the author has been paid. Even if you try to detect popular ad networks and services, the ad techniques will also evolve.

If ads were the only source of my income, and my content was unique, I would show the user a "quiz" every 5 minutes, asking him to answer, what is shown in the ads at the moment :D and deny the access, if the answer is wrong.

[+] riffraff|6 years ago|reply

if canvas become a medium for nasty ad delivery it will go the same way of <object>, I am not too worried. But as always, it will be an arms race.

[+] userbinator|6 years ago|reply

There will hopefully always be those personal/ad-free sites that are plain HTML and don't require any JS, but they might get a lot harder to find through the search engines...

[+] mgreenleaf|6 years ago|reply

Like in security, could work with a "block all" with an explicit whitelist of HTML that can be used. If it needs a canvas to display, then it isn't part of the trusted environment. That would mean you couldn't use sites that required both the ads and the content were in canvas/webassembly, but at that point they aren't really a trusted actor.

[+] kbenson|6 years ago|reply

Well, if it's first party you've at least mitigated some of the privacy concerns, and it's then about the privacy policy of the company in question.

If they want to make ads extremely hard to block but reduce privacy concerns, I'm all for that. It puts the discussion back on a more even level, and if you don't like the ads, don't use the service. The only reason I'm okay with running an ad blocker now is because of the privacy concerns. If those were eliminated (which isn't the case in this theoretical situation, they're just reduced) then I'm not sure how to justify running an ad-blocker. To my eyes, it's basically stealing cable or satellite service. I understand other people don't see it the same way though.

[+] lma21|6 years ago|reply

We can surely find ways to block canvas or webassembly from such websites no?

[+] Cthulhu_|6 years ago|reply

I don't understand why that isn't already a thing, just set up a proxy on your own domain to the ad / tracking server and you'll already avoid existing blocklists.

[+] Normal_gaussian|6 years ago|reply

I'm imagining the horror of having to do it with output analysis. An advanced renderer scans the page to identify and plaster over adverts..

[+] cm2187|6 years ago|reply

Perhaps a solution is for websites to have to request authorization to run javascript/wasm on browsers (enforced by the browser). It will allow legitimate uses of client side code, while eliminating the vast majority of abuses.

I wonder if it is also not how GDPR should have been implemented. Forcing browsers to implement a request for storing tracking data, which would avoid dark patterns in consent forms and would keep websites honest. It would also allow to remember the decision. If you delete cookies when the browser closes, you get asked for the same consent you denied on every visit.

[+] burtonator|6 years ago|reply

> At that point we will have lost.

I'm actually hoping it paves the way for more independent publishers that focus on quality content and charging.

Ads have ALWAYS sucked and been a horrible solution to funding news, journalism, etc.

The problem is that ads basically cause other users to defect to platforms that are free (but sponsored by ads).

[+] rapind|6 years ago|reply

I run a SaaS with tens of thousands of users and am not even tempted to run ads or even add third party analytics. Firefox, Safari, and Chrome all work fine on it.

Just because there are a lot of bad actors and massive amounts of advertising doesn't mean the entire internet needs to go down that path.

[+] unknown|6 years ago|reply

[deleted]

[+] egdod|6 years ago|reply

Disable JavaScript.

[+] nkrisc|6 years ago|reply

If these first party ads are just showing me some sponsored message, then I'm totally ok with that. It's the tracking and arbitrary execution of third party code that drives me to block ads. And if a site has a truly awful first party ad experience and I can't block it? I'll probably stop using it, because I don't use sites that frustrate me if I don't have to.

[+] protomyth|6 years ago|reply

From the thread: This would require uBO to send browsing history information to a remote server, this is anti-uBO.

This is why I love UBlockOrigin, that basic principled approach is a great thing.

[+] founderling|6 years ago|reply

1st party ads are not unblockable. They only lack one aspect that helps identify them (the 3rd party hostname). But they still can be dealt with.

One way browsers try to take away that freedom is by limting what extensions can do. If that continues, at one point we would need a new browser to accomplish it.

My favorite vision of the future would be if Debian would provide a version of Chrome or Firefox that: a) is stripped of all tracking and b) gives extensions full access to everything.

[+] ComodoHacker|6 years ago|reply

As gorhill mentioned in the comments, this isn't really a 1st-party tracking, it's "3rd-party disguised as 1st-party". Tracking URL is on a subdomain of 1st-party domain, but it resolves to 3rd-party IP ang request goes to 3rd-party infrastructure.

[+] tyingq|6 years ago|reply

A good example of why taking away the blocking feature of chrome.webRequest cripples full featured ad blockers. Gorhill shows both a DOM based rule and a CNAME uncloak that kills this tracker. Neither will work post manifest V3.

[+] taf2|6 years ago|reply

I mean this is nothing compared to what you can accomplish with a multi origin cdn or how about a scriptable edge cdn... fastly, cloudflare or cloud front with edge lambda... then you can effectively proxy everything on the same origin with zero ip or domain difference... only content analysis could be effective and even then it’ll become a real arms race

[+] alibert|6 years ago|reply

Someone made a script that generate a list of first party domain to block. It seems to crawl websites and extract all requests and compare domains with regex of known first party tracker.

https://git.frogeye.fr/geoffrey/eulaurarien (for running your own script)

https://hostfiles.frogeye.fr/firstparty-only-trackers-hosts.... (to be imported in your adblocker)

[+] 01acheru|6 years ago|reply

I've done something like that some years ago to ensure that the tracking we were using on our website worked even if a user had an AdBlock. (it wasn't a tracker as in ad tracker, but a tracker nontheless)

It was fairly trivial: instead of asking for the tracking script directly I put a small service of ours in front of it, so our website asked for foo.mycompany.com/stats.js that fetched the correct script and changed all URLs inside of it from mycomp.mytrack.com to foo.mycompany.com. Our service than acted as a proxy to mycomp.mytrack.com.

A simple solution that worked out for a while, lost by now, in a server, somewhere in Ireland...

[+] danielrpa|6 years ago|reply

I can see a future where safe browsing of the web will only be possible through some sort of whitelisting. Search engines or crowdsourced lists will exist to tell you which places you should or shouldn't go in the web based on Ads or tracking features, just like you can find out the relatively few places that serve Vegan or Kosher food.

This isn't exactly a great future but we need to accept that nowadays most people don't care about tracking or ads. This might change one day, but it's the status quo.

I think that anti-ads and anti-tracking, if trying to work alongside the full feature set of the Internet, is fighting a honorable but eventually losing war. Even if someone sorts out this particular issue, you are still tracked by 1) Your ISP and 2) other subtle client fingerprinting technologies. You can use Tor/VPN/disable JS but all of these have downsides.

All we can do is to fight with our time and wallets by not visiting places that don't support our values. This is possible and not different from the world we live in already.

[+] cm2187|6 years ago|reply

I don't really have a problem with first party tracking, unless it can correlate my identity across websites. But otherwise I have no problem with website X knowing that I browse website X.

Can first party tracking do this sort of correlation other than through browser fingerprinting?

[+] danShumway|6 years ago|reply

One takeaway from this[0]:

> Can't this be "emulated" in Chromium by resolving the hostnames using DNS over HTTPS in JSON format?

> - This would require uBO to send browsing history information to a remote server, this is anti-uBO.

> - Chromium does not support non-blocking webRequest.onBeforeRequest listeners, so this can't work reliably when the result depends on an asynchronous operation such as a DNS lookup.

> - Chromium's webRequest.onBeforeRequest blocking ability is being deprecated with Manifest V3, so even without the above issues, this would be wasted development efforts.

I suspect (and suspected) that Ublock Origin may eventually be deprecated for Chrome after the V3 changes shipped the same way it was deprecated for Safari, although to the best of my knowledge Gorhill hasn't made any kind of official announcement about that -- so I want to be clear that it's purely conjecture on my end.

This thread reinforces that suspicion. I don't know if Gorhill has explicit plans to do anything with Chrome, but I suspect that in the future it will become more and more annoying to support the browser.

I also previously made a prediction[1] that within 2 years, Firefox would have clearly better tracker-blocking extensions than Chrome (65% likelyhood). This is roughly in line with what I would expect to see with that prediction -- a few rare fringe-case scenarios that creators can quickly address in Firefox, but that require more extensive work and conversations around Chrome. If this kind of issue becomes more common in the future, then I'll feel more confident about that prediction.

[0]: https://github.com/uBlockOrigin/uBlock-issues/issues/780#iss...

[1]: https://news.ycombinator.com/item?id=21506330

[+] cobbzilla|6 years ago|reply

Can’t an ad blocker do the CNAME resolution and then not load the URL if it resolves to a 3rd-party hostname?

Using an A/AAAA record would be harder, you’d need to have an IP blacklist, and the trackers would probably be constantly shifting IPs, using a low TTL on the record.

This might require giving your tracker programmatic access to your DNS, not sure if many first parties are ready to go there.

[+] SeanMacConMara|6 years ago|reply

im happy to just block/not visit those entire domains

there comes a point when the content is just not worth it

doesnt scale obviously

we're headed back to the "golden age" of TV advertising except via http instead of radio waves/cable

[+] floatingatoll|6 years ago|reply

WebKit Intelligent Tracking Protection 2.0 and later seem to treat all subdomain as equal for tracking purposes.

Does Safari recognize these trackers?

https://webkit.org/blog/8311/intelligent-tracking-prevention...

> Does ITP differentiate between my subdomains?

No. ITP captures statistics and applies its rules for the effective top-level domain plus one, or eTLD+1. An eTLD is .com or .co.uk so an example of an eTLD+1 would be social.co.uk but not sub.social.co.uk (eTLD+2) or just co.uk (eTLD).

https://webkit.org/blog/9521/intelligent-tracking-prevention...

ITP 2.3 counteracts this by downgrading document.referrer to the referrer’s eTLD+1 if the referrer has link decoration and the user was navigated from a classified domain. Say the user is navigated from social.example to website.example and the referrer is https://sub.social.example/some/path/?clickID=0123456789. When social.example’s script on website.example reads document.referrer to retrieve and store the click ID, ITP will make sure only https://social.example is returned.

[+] scoutt|6 years ago|reply

I am glad to see that websites are struggling to ruthlessly track users for showing ads. It means that the efforts so far are working. They have our attention.

Now it's time for standards creators to be responsible and to do something in favor of their users: the user should have the final decision about being tracked/shown ads.

But when the most used web browser is in the hands of the biggest ad vendor, then there is little hope...

I guess we can only expect for web pages to turn into a sort of dynamically-generated JPEG that cannot be parsed/analyzed.

[+] JakaJancar|6 years ago|reply

I don't see the problem.

If the publisher develops an analytics, user profile or whatever solution themselves, then it's 1st party - OK.

But outsource the development or hosting to someone, and suddenly it's a problem? How is this different to using AWS?

Now if the data from their different clients is merged to build a unified view of what you do on the web, that's different, but the place to prevent that is in the browser using cookie partitioning, not by caring about the tracker developer/hosting provider.

[+] iramiller|6 years ago|reply

So what we need is a DNS service that takes in all of the DNS record updates per normal DNS replication and flags these CNAME record entries into an easily consumable blocklist.

[+] novaRom|6 years ago|reply

If me as a human can identify ads easily, Machine Learning and Computer Vision could be used instead of black-listing domains like in this case.