top | item 13075199

My fight against CDN libraries

298 points| agateau | 9 years ago |peppercarrot.com | reply

141 comments

order
[+] mstaoru|9 years ago|reply
I only represent about 0.00000013% of all Chinese Internet users, but let me chime in: EVERY website that uses Google CDNs for js or fonts just doesn't work here. It just keeps loading and loading, and loading forever. In most cases it's jQuery, and in most cases it's in the <head> so the page just never shows. Cloudflare (cdnjs), Amazon CDNs, Akamai CDNs also occasionally get blocked and take entire Internet segments with them.

If you use 3rd party CDNs, please consider implementing client-side failover strategy so you don't leave out 50% of the Internet "population".

[+] this-dang-guy|9 years ago|reply
For my new site, what I'm doing is a full fallback - local, cdn font, served from my site, then regular font-family fallback.

Not sure if that works properly in China, if it just spins. It might never 'fail' and fall back. I'd need to test that.

Like so: src: local('Slabo 13px'), local('Slabo13px-Regular'), url(https://fonts.gstatic.com/s/slabo13px/v3/B9U01_cNwYDvIHK04hX...) format('woff2'), url(https://fonts.gstatic.com/s/slabo13px/v3/fScGOqovO8xyProgHUR...) format('woff2'), url("/fonts/Slabo13px-Regular.ttf"); }

[+] en|9 years ago|reply
Maybe not the best solution, but for the js CDN of Google, I'am using a self hosted mirror with this list https://github.com/euh/googleapis-libraries-list

Unfortunately in HTTPS, I need to bypass HSTS protection and for Firefox it's annoying (I don't know for Chrome).

For the fonts I tried to create a self hosted mirror too, but Google does not offer to download exactly the exact fonts they host.

[+] 9248|9 years ago|reply
Funny thing, most if not all 'client-side failover' strategies you might find through google or the likes won't work either.

This is because the loaded resource will 'fail' anywhere between x seconds up to minutes, or never! In the meantime the user just sees a blank page, or best case, some 80-90% page that keeps trying to load something...

I've experienced this myself a couple times. Most probably my ISP messed up some stuff taking down whole chunks of 'internet' :)

[+] danielrhodes|9 years ago|reply
Is this a DNS block or an IP block? For example: if you serve assets from your domain but CloudFlare sits in front, do the assets still load?
[+] andruby|9 years ago|reply
Do you know of a way for a non-Chinese to test a website from your end? Is there a VPN or Bowser testing site that we can use?
[+] saddestcatever|9 years ago|reply
I'm curious - do you know of any chrome plugins / client-side tools to help solve this problem? Seems like it could be very infuriating for the semi-competent Chinese internet user. Wouldn't be too bad to write a plugin that finds dependencies from "allowed"/"works-in-China" CDNs.
[+] pritambarhate|9 years ago|reply
Thanks for sharing this. Is there a list of big domains such as Google which don't work in China? Does adding social logins like FB and G+ also makes the login pages to break?
[+] samhamilton|9 years ago|reply
I feel your pain, when building my sites and if I use a 3rd party js/css cdn I use the only one I have found that has ICP and a damn good list of western nodes - jsdelivr.com
[+] herbst|9 years ago|reply
Really glad about your comment, i honestly havent thought about this a single time yet. I guess i am missing out on a lot of traffic.
[+] throwawasiudy|9 years ago|reply
I don't like the censorship policies of the Chinese government. I'm not going to go out of my way to make sure my site is compatible with their censorship, use a VPN.
[+] a3n|9 years ago|reply
Firefox on Linux.

I use uBlock Origin, Ghostery and Disconnect, and Flash Control. peppercarrot.com is all zeroes for all three blockers, meaning nothing is blocked because there's nothing noticed that needs to be blocked. There are no Flash Control icons, meaning no video or audio noticed and blocked. Thanks for caring. :)

On the front page of theguardian.com, logged in as me, there's a V icon at the top, meaning that Flash Control has blocked video, probably for some gratuitous menu feature. I have zero trouble using and reading the site.

When I first opened theguardian a few minutes ago, uBlock was blocking 13 requests. It's steadily climbed in those minutes to 32 blocked requests. Ghostery is noticing/blocking 0 trackers. Disconnect is blocking two: nielsen and comscore. Disconnect is also blocking 1 from Facebook and 3 from Google. All three tools may be seeing and blocking some of the same things.

Without these four tools, except for low/no-commercial technical sites and public service sites like wikipedia my web is all but unusable. With them my web is fine.

I very rarely have any problems using any site. I had to enable my bank in uBlock to use their popup bill pay feature. I think I had trouble viewing a cartoon at The New Yorker; I forget what I did to view it. Youtube and Flash Control seem to be in a perpetual arms race, as was the case with Flashblock. Youtube is my main motivation for using Flash Control, to prevent automatic video playing.

And yep, I get that sites pay the bills with ads. I $ubscribe to three news sites, and I also get that that doesn't pay the whole bill. The web is either going to have to block me for using a blocker (I've been seeing that very rarely recently, or at least "Unblock us please") or figure out a less dangerous, intrusive and loadsome way to serve ads. (And yep, I just made up the word "loadsome." I can do anything!)

EDIT: I whitelist duckduckgo.com in uBlock.

https://duck.co/help/company/advertising-and-affiliates

https://duckduckgo.com/privacy

[+] JustSomeNobody|9 years ago|reply
> I use uBlock Origin, Ghostery and Disconnect, and Flash Control.

I just have to say, thank goodness for Moore's Law. Without it, we would never have so many wasted cycles![0]

[0] Not saying you're wasting, but the fact that we have to jump through sooo many hoops to stop all this crap is just disgusting.

[+] kalleboo|9 years ago|reply
Some of the trackers load more trackers - taking theguardian.com as an example again, with Ghostery on it blocks only 6 items. But whitelist the site and after it lets those 6 load, now it finds 18 trackers.
[+] Lev1a|9 years ago|reply
IIRC, Ghostery was scorned in the not so distant past for whitelisting sites for money. Is that still the case? Because ever since then I have used Privacy Badger (https://www.eff.org/privacybadger) which is made by the EFF which I know is financed mainly through donations and in recent years more and more through the Humble Bundle (https://www.humblebundle.com/) so not financed by companies that would want their trackers etc. in sites regardless of user choice or as opt-out.
[+] makecheck|9 years ago|reply
And YouTube curiously will trigger hundreds of blocked requests in uBlock Origin over just a few minutes when watching a video. I can’t imagine why it needs so many continuous “questionable” accesses.
[+] kasparsklavins|9 years ago|reply
Not sure if this is a feature of youtube or chrome, but when opening a video in a new tab, it does not play until I have that tab in focus.
[+] mark242|9 years ago|reply
From the post:

"Well a big one: Privacy of the readers of Pepper&Carrot."

Before even thinking about tossing things like Google Fonts or AddThis or whatever, the very first thing you need to do is turn on HTTPS. If you're concerned about privacy, or content injection, or MITM attacks, or name-your-poison-here, you must immediately only serve up pages via HTTPS with strong encryption.

[+] mpweiher|9 years ago|reply
These seem completely independent to me.

- HTTPS is for attacks.

- What the article describes is run-of-the-mill tracking by Google etc.

If I am not being attacked, the CDN resources will still allow Google to track me. If I am being attacked the CDN resources will still allow Google to track me.

If I don't have these Google resources (let's just use Google resources for now), I don't think that Google will MITM me.

[+] Klathmon|9 years ago|reply
The worst part is that HTTPS is there and works, all they need to do is add the HSTS header and they would instantly improve the security of every single visitor for free.

I generally hate when people point out things like "if you really cared about your users like you said you do, you'd implement [unrelated thing]", but in this case it's an extremely small change that would improve the privacy for every single one of their visitors.

[+] fencepost|9 years ago|reply
Those are to a large extent different problems. For one you are eliminating requests to outside hosts from your own website and thus avoiding having those outside hosts track your users. For the other, adding encryption, you're preventing the carrier being used at either end or in between from tracking which pages on the site are visited but not so effectively whether the site was visited at all. Without the libraries being loaded Google and other CDN Library providers have no way of knowing whether I have visited that site unless they are also providing the underlying network connection that I am using.
[+] hhsnopek|9 years ago|reply
The only issue with going against the grain here if you're not putting your site itself behind a cdn. It'll vary in download rates across the global. This was the intended use case for CDNs, but analytics are added so CDNs can improve.

You're correct with the fact that they are tracking us, but there's a trade off that comes with this that holds tremendous value. If that value of speed isn't a factor or low on your list of priorities then by all means, sever everything.

[+] cagenut|9 years ago|reply
This post and half the comments are killing me on conflating "third party javascript" with "CDN".
[+] pselbert|9 years ago|reply
Yes. While I completely agree with the author and their quest to eliminate third party scripts from their site, the problem isn't with CDNs. The problem is with third party scripts, most of which aren't coming from a typical CDN (cdnjs, for example).

It is entirely valid, and common, to front your own application code behind a CDN.

Love the sentiment, just wish the terminology was more accurate.

[+] vbezhenar|9 years ago|reply
CDN is common enough technique which should be standardized in browsers. HTML should include link to resource hosted by site and its checksum. Now browser can easily use cached resource from any other site with the same checksum or just download it from site.

There are 2 reasons to use CDN. First is caching (different sites using the same resource from the same CDN will download it only once), second is speed (some browsers restrict connection count to the same domain, so hosting resources on a different domains might improve download time). Caching is better solved by using checksum as a key, instead of URL. Speed with HTTP/2 is not an issue, because there's only one TCP connection. The only advantage of CDN might be geographically distributed servers, so user from China would download resource from China server instead of US server. I don't see easy and elegant way to solve it, but I'm not sure it should be solved at all, HTTP/2 pushing resources should be enough.

[+] kakarot|9 years ago|reply
I use uMatrix and do not load external web fonts. I am stripping out CDN reliance in our stack at work as well. This practice of supporting secure protocols but still trading ease-of-development for end-user privacy & security must stop.
[+] blauditore|9 years ago|reply
Maybe I'm missing something crucial, but why not just host the content on your own server? I.e., just download that Google font, jquery.js or FontAwesome and serve it directly instead of using an external CDN.

The post seems to say "I don't like where some content is coming from, so I re-created said content by myself".

[+] CapacitorSet|9 years ago|reply
To first thought, there may be licenses in place preventing you from self-hosting the content.
[+] brianwawok|9 years ago|reply
Then you pay the bandwidth bill.

I know I would rather save a few bucks over make a site work for China. Many sites don't need to work in China.

[+] Fluxenein|9 years ago|reply
That leaves AddThis and Gravatar
[+] JoshTriplett|9 years ago|reply
Great to see someone paying attention to the problem of loading third-party <script>s, and talking about the work required to avoid them.
[+] pselbert|9 years ago|reply
Before I knew it was a comic site I was amazed they took the time to copy all of the icons they wanted as svg. Even knowing the author is an illustrator it is still admirable and impressive.
[+] gefh|9 years ago|reply
> the work required to avoid them And that's the rub, it was a _lot_ of work. It's nice to see it can be done, but few sites will have the time or inclination.
[+] splitbrain|9 years ago|reply
It's awesome that nearly 10 years after I came up with MonsterID, it's still going strong. I love those cats.
[+] tscs37|9 years ago|reply
Why use alternatives?

You can download the Google Web Fonts and serve them from your host.

You can also download and serve Font Awesome from local.

And there doesn't seem to be a reason why you can't do it with gravatar either.

I don't get this post honestly. It seems to be about replacing stuff with other stuff instead of replacing CDN with locally served content.

[+] madeofpalk|9 years ago|reply
Good. Another reason not to use these CDNs is they're additional risk and introduce the potential for downtime and breakage. It's an additional point of failure that just doesn't come with many benefits.

I'll happily use these services for quick POCs and throwaway demos, but once anything starts to become semi-permanent I'll make sure I control my uptime and host these assets myself.

[+] this-dang-guy|9 years ago|reply
I've started to leverage them with fallback, but I guess I'll see how that plays out. (For fonts - I don't use anything else from a CDN, aside from front caching with cloudflare)
[+] dillondoyle|9 years ago|reply
AddThis makes money by selling 3rd party audience segments to advertisers like me. I assume they get this data by tracking what users view what pages through their sharing buttons. Example segments I can buy to advertise too: http://i.imgur.com/JF6ZZPC.jpg

The author doesn't even mention the big players: every FB share or like button, on all that nasty porn you watch (even in incognito mode), straight to FB. They recently changed their policies and signaled that they are going to start using this data for ad targeting, probably in a push to expand FAN and be more competitive with Google.

Something as simple as a share button that some blogger copy and pasted into their blog turned into an ad tech/data company!

I personally love that story and think that's cool and innovative thinking from AddThis.

But I also think more data = better ads, at the expense of privacy (probably not a popular opinion around here).

[+] brianzelip|9 years ago|reply
Off topic, but the root site of this blog post is pretty awesome - "Pepper & Carrot: A free, libre and open-source webcomic supported directly by its patrons to change the comic book industry!"
[+] thinkMOAR|9 years ago|reply
Wonder if there will be a time CDNs of these will pay you for the visitor data you 'share/leak' with them via the linked resources (to convince you to keep using them).
[+] WildGreenLeave|9 years ago|reply
I really like CDNs because of the ability to drop in a file and know it will be cached correctly. (Also there is a high probability that your user already has a cached version of the file) But never thought about CDNs being able to track you.

Isn't there an alternative? A more transparant way to provide users with source files and still keep the 'cached items' aspect.

[+] ludwigvan|9 years ago|reply
In the case of Google fonts, is it legally possible to download the font and serve it from one's own server? The FAQ has a relevant section, but does not answer this question: https://developers.google.com/fonts/faq
[+] wanda|9 years ago|reply
IANAL but they would not appear to be able to construct a case against you for using the fonts on your own server, since at no point is it stated that such a practice would be in violation of the terms of use.

As you observe, they do not explicitly answer the question, but their reticence should be taken as an implicit green light, encased in a warning about loading times.

Most Google fonts are merely served from their hardware, and not created by them, so the license selected by the font's creator applies. Think of Google Fonts as an aggregator of free-to-use fonts.

There is also a list of fonts and their licenses available from Google Fonts here: https://fonts.google.com/attribution

If you're really concerned, check who created the font and see if they make the font available under a permissive license on their own website. Lato, for instance, is available from its creator's website and is published under the Open Font License.

[+] pmlnr|9 years ago|reply
As far as I'm aware, the fonts on Google are just fonts, not owned by Google.

Example: https://www.fontsquirrel.com/fonts/playfair-display Playfair Display - "Copyright (c) 2010-2012 by Claus Eggers Sørensen ([email protected]), with Reserved Font Name 'Playfair'" in the SIL licence right next to the font files.

Therefore yes, you should be able to download them, and use them, according to the original licence. ( Which, by the way, usually required the font creator to be credited, which Google only does when you select it, but not in the served CSSs, which I believe, is not fair. )