top | item 27047243

ClearURLs – automatically remove tracking elements from URLs

803 points| stanislavb | 4 years ago |github.com

307 comments

order
[+] jacobajit|4 years ago|reply
A particularly bad instance of link tracking I've found is in TikTok's link sharing feature.

If you share a link from the TikTok app, it gives you a vm.tiktok.com/[xyz] link to send/post elsewhere. It gives you no indication that this isn't a generic link to the post, nor does it give you an option to expose the generic link to the post.

Instead, when you share that link and someone clicks on it and does not have the app, it opens with a header saying "[First Last] is on TikTok." On the other hand, once you do click on that link (if and only if you don't have the app installed), you get redirected to the static link to the video and finally obtain it.

This is an anti-pattern that enables further tracking and potentially unknowingly exposes user data when links are shared publicly. And there's no indication to the user that this is happening, since the link is structured as if it does not contain any tracking. Ie a tool like this wouldn't be able to "strip out" the tracking since it isn't tacked on in any way, but embedded as the generated link itself.

[+] gonehome|4 years ago|reply
That’s pretty bad. I think TikTok’s risks are higher than people think. It’s better to avoid it.

https://stratechery.com/2020/the-tiktok-war/

Any company running out of mainland China is going to have serious privacy problems due to CCP influence and their need to comply with both local laws and the government’s interest in influencing public sentiment.

[+] userbinator|4 years ago|reply
With websites, at least you can just copy the URL from the address bar and clean it. Of course, people are being slowly dumbed down by browser's (mostly Chrome, but Firefox seems to follow its stupid trends not long afterwards) attempts at removing or hiding the URL, which is no surprise when you realise that herding the userbase to use dedicated "share" buttons (complete with tracking) is one of the reasons they're doing that.
[+] imiric|4 years ago|reply
Stack Overflow does something similar, and adds a user tracking ID to any shared link, though apparently it's possible to remove it without breaking the link[1].

I only noticed when I received a badge for how many times it was clicked, and even though it's not nefarious I'd still prefer it to be opt-in rather than done by default.

[1]: https://meta.stackoverflow.com/q/277769

[+] joshstrange|4 years ago|reply
Yes, I regularly warn people on Reddit that their full name is being leaked in the TikTok link they shared. I have an iOS shortcut that expands the URL and chops off the gross tracking stuff so I can share links in private/public without exposing my TikTok "name" (I don't link any accounts and my name is made up).
[+] Breza|4 years ago|reply
VRBO is another egregious example. My friend asked what I thought about a house she was thinking of renting for a trip. VRBO wouldn't let me view the link on my phone unless I downloaded their app. I had her copy and paste the house's description which I then Googled to get to the right listing.
[+] milofeynman|4 years ago|reply
When twitter's snowflake was lengthened recently I was worried they might be doing this too. I'm afraid of the big ones moving to this. Spotify, instagram, twitter, etc
[+] space_fountain|4 years ago|reply
A fun/weird result of this that the interface in the link is in the language of whoever generated the link not your browser’s language
[+] 1vuio0pswjnm7|4 years ago|reply
Assuming any certificate pinning can be defeated, it is easy to manipulate URLs with a loopback-bound forward proxy. Would be great if someone provided example of one of these TikTok URLs so we could investigate.
[+] jtbayly|4 years ago|reply
But this can be solved, too, can’t it? It’s effectively a Bitly link. Just need to auto-expand to the final destination, right?
[+] 3np|4 years ago|reply
Discord does something very similar
[+] vagrantJin|4 years ago|reply
This is needlessly alarmist.

A short video platform can hardly be expected to be a paragon of security and privacy. It has no utility whatsoever. I don't see where the concern comes from. A video of someone drinking coffee does not particularly invoke a point of concern.

What may be the real concern is China and the fact that the app is tied to it. Thats more race/geo-politics/war-mongering issue than a privacy concern.

[+] ronjouch|4 years ago|reply
I'd love if Firefox's built-in Tracking Protection did without an addon the job ClearURLs does, so two months ago I created

Bug 1697982: "Firefox Tracking Protection should protect against URL/queryparam-based tracking (like ClearURLs/NeatURL addons do)" , https://bugzilla.mozilla.org/show_bug.cgi?id=1697982

Please vote for the bug if you'd like it too.

Also, I see a few interesting comments in this HN thread; this evening when the dust settles, I'll aggregate & bring them to the bug for consideration if/when fixing this bug is considered.

[+] eythian|4 years ago|reply
I don't really know how I feel about having the browser mess with URLs without the user engaging it deliberately. It feels to me something that should perhaps be approached with caution. On the other hand, it does make sense. It's a tricky one.
[+] VortexDream|4 years ago|reply
I think the problem with this is that ClearURLs can break legitimate uses for URL params. I need to disable it when I do things like online payment. That's not intuitive for users and means an integrated solution needs to take laypersons into account who wouldn't know how to solve the problem (or even what the actual problem is). Is that realistically solvable?
[+] surround|4 years ago|reply
Mozilla themselves is guilty of link tracking. Any external link on addons.mozilla.org looks like this:

  outgoing.prod.mozaws.net/v1/25c02fd4e609951729e0ec0b41fe5391d912511b45d2a02aeaa839872c8d9def/https%3A//gitlab.com/KevinRoebert/ClearUrls
[+] wackget|4 years ago|reply
You should also suggest they remove their own garbage redirect tracking from the Firefox Addons site.

Any URLs in the addon description section are all tracked/redirected via `https://outgoing.prod.mozaws.net`

[+] daveoc64|4 years ago|reply
I am not a fan of making such functionality part of the browser.

I use the HTTPS only mode in Firefox - it breaks some sites, and telling Firefox to disable the mode for a specific site doesn't always work.

I feel like a plugin (HTTPS Everywhere) can deal with this a lot better than something that's integrated and reduced to a single checkbox in the settings.

[+] guilhas|4 years ago|reply
Firefox should worry about implementing standards as fast as possible and improve the browser speed

And Stop trying to "re-implement" features for which there are already user extensions way more capable

[+] nagarjun|4 years ago|reply
Instead of it being a default feature, I wonder if this makes more sense as a default in Incognito/Private browsing mode?
[+] codingdave|4 years ago|reply
I would love to see an 'educational' mode on this - rather than just removing the tracking elements, put some info on-screen that shows what was removed and why, so people can use this as a tool to learn more about what types of tracking exist online and how common it is. Hopefully that would lead to a more knowledgeable end user community online and we can have more nuanced discussions in the future about where tracking is benign, and where it is not.
[+] uo21tp5hoyg|4 years ago|reply
Not exactly what you requested but there's the ability to log all requests that are processed: if you click the extension icon and then under "Configs" enable logging, then at the bottom of the ui there's a button for checking the logs. This will show you the before and after processing urls, the rules that were triggered, and when.
[+] ycombinete|4 years ago|reply
I agree. I uninstalled this add-on precisely because I couldn’t quite figure out what it was doing or where it was doing it. Unlike an add blocker there’s very little tangible difference when it’s on or off
[+] anticristi|4 years ago|reply
While I greatly value my privacy to the point where I donate to noyb.eu, removing utm campaign tags feels too much. Those do not commonly contain private information. I believe that marketers should feel free to use those to measure the effectiveness of their campaigns, instead of relying on more privacy-intrusive and opaque methods (e.g. cookies, fingerprinting, IP address collecting, etc.).
[+] matheusmoreira|4 years ago|reply
> I believe that marketers should feel free to use those to measure the effectiveness of their campaigns

I don't. I believe marketers should have exactly zero ways to measure the effectiveness of their mind hacking efforts. Any data they try and collect should have negative value by virtue of being completely randomized by the browser.

Actually I believe marketers shouldn't even exist. Nothing they say is trustworthy by virtue of conflict of interest. The internet would be much better off without these constant attempts to subvert it for their purposes.

[+] maple3142|4 years ago|reply
I usually removes those parameters manually when I want to share the url to my friends, so it is quite useful for me.
[+] cyborgx7|4 years ago|reply
Every time any kind of measure to improve people's browsing experience is posted here someone comes along and explains how this one is too much. But they are always wrong. There is no "going too far" in optimizing the browser for the people who are using it.
[+] rplnt|4 years ago|reply
That's not the only issue. The ids are then fed back into the facebook.

Facebook can use it to link contacts together. I get a share link, it gives it an ID, I send it to someone, they open it and now they have linked my account with their account. Same works if I click on a page and get the ID, share just that page, and someone clicks it (and there's some fb element on the page).

Now if several users a day share a link here on HN, facebook will know about us as belonging to a certain group.

[+] selfhoster11|4 years ago|reply
UTM tags are unsightly. I always strip everything but the core part of a URL before sharing.
[+] bottled_poe|4 years ago|reply
Nah, the industry started a war on consumers. That’s what they are getting.
[+] throwaway81523|4 years ago|reply
I always remove them. They're like referer headers. Where the visitor came from is just like any other info that might be useful to the site operator, but is really not any of their business unless the visitor voluntarily discloses it.
[+] DangerousPie|4 years ago|reply
I don't mind people stripping these tags manually for link sharing, but stripping them across the board would be a major issue for website that finance themselves through affiliate links. Suddenly your referrals are no longer tracked and your main source of revenue dies up.
[+] marban|4 years ago|reply
Related, if you're looking to clean urls on the backend, here's my current pattern used on https://upstract.com and some other news aggregators I've built:

startswith: 'utm_', 'ga_', 'hmb_', 'ic_', 'fb_', 'pd_rd', 'ref_', 'share_', 'client_', 'service_'

or has: '$/ref@amazon.', '.tsrc', 'ICID', '_xtd', '_encoding@amazon.', '_hsenc', '_openstat', 'ab', 'action_object_map', 'action_ref_map', 'action_type_map', 'amp', 'arc404', 'affil', 'affiliate', 'app_id', 'awc', 'bfsplash', 'bftwuk', 'campaign', 'camp', 'cip', 'cmp', 'CMP', 'cmpid', 'curator', '[email protected]', 'efg', 'ei@google.', 'fbclid', 'fbplay', '[email protected]', 'feedName', 'feedType', '[email protected]', 'forYou', 'fsrc', 'ftcamp', 'ga_campaign', 'ga_content', 'ga_medium', 'ga_place', 'ga_source', 'ga_term', 'gi', '[email protected]', 'gs_l', 'gws_rd@google.', 'igshid', 'instanceId', 'instanceid', '[email protected]', 'maca', 'mbid', 'mkt_tok', 'mod', 'ncid', 'ocid', 'offer', 'origin', 'partner','[email protected]', 'print', 'printable', 'psc@amazon.', '[email protected]', 'rebelltitem', 'ref', 'referer', 'referrer', 'rss', 'ru', '[email protected]', 'scrolla', 'sei@google.', 'sh', 'share', '[email protected]', 'source', '[email protected]', 'sref', 'srnd', 'supported_service_name', 'tag', 'taid', 'time_continue', 'tsrc', 'twsrc', 'twcamp', 'twclid', 'tweetembed', 'twterm', 'twgr', 'utm', 'ved@google.', 'via', 'xid', 'yclid', 'yptr'

Edit: Will turn this into a Gist at some point.

[+] asymmetric|4 years ago|reply
Note that this addon requires the "Access your data for all websites" permission[0], which means:

> The extension can read the content of any web page you visit as well as data you enter into those web pages, such as usernames and passwords.

I'm sure the devs are super trustworthy, but there have been cases of legitimate extensions falling in the wrong hands, and this, coupled with automatic extension updates, could be a big security hole in your setup.

[0]: https://support.mozilla.org/en-US/kb/permission-request-mess...

PS: Ironically, the link above has utm elements.

[+] cies|4 years ago|reply
This add-on together with Firefox, Bitwarden, uBlock Origin, HTTPS everywhere and EFF's Privacy Badger I us to improve my privacy online. Once a blue moon (few times per year) I have to switch them off to get a site to work.

Besides that I only have the Tree Style Tab add-on installed, which is much recommended.

[+] jordoh|4 years ago|reply
It should be noted that this extension strips ETag headers from all responses by default, which can break sites in surprising ways. As a developer of a web application that relies on ETag headers for vital functionality, I see not-infrequent support inquiries from ClearURLs users who don't understand the technical ramifications of this feature - nor do they understand why so many of the websites they use are so broken.
[+] JimDabell|4 years ago|reply
There’s lots of rules and patterns in this implementation, but it’s worth bearing in mind that you can normally get a clean URL by looking at the <link rel=canonical> element.

Sites put this in because they want search engines to index a single clean URL rather than many tracking URLs, so it’s pretty reliable.

[+] account42|4 years ago|reply
That works if you want to get a clean URL to share with others. But if instead you have gotten a link then not using built-in patters means you would first need to retrieve the site with the tracking parameters to get to the canonical URL.
[+] BerislavLopac|4 years ago|reply
I wrote a little bookmarklet that serves me pretty well for similar purposes:

    javascript:window.location=window.location.href.replace(/\?([^#]*)/,function(_,s){s=s.split('&').filter(function(v){return(!/^utm_/.test(v))}).join('&');return(s?'?'+s:'')});
It's much limited as it focuses on Google's links, but it works good enough for many cases.
[+] tomudding|4 years ago|reply
Lovely extension, some discussions about its functionality can be found in this thread [0] after the removal of the extension from Chrome's Web Store.

One things I noticed is that it can be too aggressive from time to time. I encountered this "issue" when creating a Bitwarden account, I was unable to verify my e-mail address because ClearURLs was (unbeknownst to me) removing some of the parameters from the activation URL. While similar cases will most likely not be frequent, it can be really frustrating to determine why something does not work (also applies to ad blockers).

[0]: https://news.ycombinator.com/item?id=26564638

[+] crazygringo|4 years ago|reply
I love this just for the usability alone, never mind being anti-tracking.

I'm tired of every time I want to share a product page or post a URL or something, of having to strip 300 friggin' nonsense characters from the end of it.

[+] l1am0|4 years ago|reply
Shameless self plug: My service https://unshort.link does this as well and also unshortens shortlinks to show you where they are pointing to :D

Open Source and Free to Use

[+] gdsdfe|4 years ago|reply
It's sad that nowadays we need at least a dozen of add-ons just to have a decent browsing experience on the web
[+] ignoramous|4 years ago|reply
Another way to browse one-off sites one visits is to through a mirror like https://archive.is/ (I exclusively use mirrors to view posts on content aggregators like Medium, Substack, Buzzfeed, Blogspot, Wordpress; annoying News websites that download a gazillion files; and file-hosting websies like imgur).

A caveat: When you submit a request to archive a url, archive.is sends the client-ip (X-Forwarded-For) to the destination server.

[+] slver|4 years ago|reply
This is one of those things that either few use and it works, or if many start using it, the tracking will just get obfuscated.

I already see many sites use something like ?arg={BASE64 STRING OF ALL THE THINGS} and no automatic tool can decypher that as it's a custom list of bytes.

[+] DangerousPie|4 years ago|reply
This is a neat extension but I think we should acknowledge that stripping parameters like these from affiliate links is going to cause major problems for websites that are financed through affiliate revenue, even if they are open and honest about it.
[+] pellias|4 years ago|reply
They really need to allow whitelist, i uninstalled because some sites cannot function with it and there is no way to whitelist.