top | item 31679872

Public CDNs Are Useless and Dangerous

220 points| pmoriarty | 3 years ago |httptoolkit.tech | reply

191 comments

order
[+] night-rider|3 years ago|reply
It always annoys me when a site hangs when it tries to load something from fonts.googleapis.com. Developers need to use a system font stack which loads fonts from the system itself instead of remote web fonts. On a fast connection you don’t notice it, but you will notice it on a slow connection, and the majority are on slow connections. Developers need to deliberately use slow connections and see just how responsive their site is. Don’t develop on a hyper fast connection because you assume the site loads super quick and overlook latency issues. I think you can emulate slow connections in browser devtools so there is that.
[+] lelandfe|3 years ago|reply
Use font-display, people.

Required reading, “More than you ever wanted to know about font loading on the web”: https://www.industrialempathy.com/posts/high-performance-web...

Edit:

> Developers need to use a system font stack which loads fonts from the system itself instead of remote web fonts

IMO the takeaway is "self-host fonts, make sure they're small files, load early, and don't block render," not "only use native fonts"

[+] onion2k|3 years ago|reply
Developers need to use a system font stack which loads fonts from the system itself instead of remote web fonts.

The default font display setting for Google fonts is 'swap'. Using font-display means the browser will block rendering for a short time, and then use the fallback font if the font hasn't loaded, and then swap in font when it does eventually load.

But, and here's the important bit, the block period for 'font-display: swap' is 0ms. In other words, Google Fonts works how you think it should already so long as the developer has included a fallback system font in the font-family.

I suspect you're attributing a blocking problem to Google Fonts when really it's caused by something else.

[+] Zardoz84|3 years ago|reply
You can serve the font from your own web server, and avoid any problem with fonts.googleapis.com
[+] eloisius|3 years ago|reply
I disabled web fonts by default in uBlock Origin. The internet is so much better without them. I selectively enable them for sites I use frequently that have special icon fonts that mess up the interface if they can't load.
[+] usrn|3 years ago|reply
It's custom fonts that don't share metrics with built in fonts. If they don't do that the page content jumps around when the fonts load. Custom fonts were a terrible idea, I wish browsers would just drop support for them.

EDIT: Unless there's a way to include metrics outside the font there really isn't a way to prevent reflows when it loads other than blanking the entire page until then.

[+] WhyNotHugo|3 years ago|reply
+1

It's not even clear if using Google Fonts is GDPR compliant (since you're leaking the fact that your visitor has visited your website): https://github.com/google/fonts/issues/1495

Even if you do want to use your own hand-picked font on your site, just self host it, or bundle it with something like https://fontsource.org/

They provide the tools to bundle the fonts and serve them yourself: https://fontsource.org/fonts/merriweather-sans

EDIT: The original comment mentioned "fortawesome" instead of "fontsource". I mixed up the names, my bad!

[+] WorldMaker|3 years ago|reply
> Developers need to use a system font stack which loads fonts from the system itself instead of remote web fonts.

Browsers have always loaded a font from the system over a web font if it has a font of that name.

We could just normalize having a lot more free as in beer fonts installed on the average user's machine.

The unfortunate emphasis there is "average user's machine" as font loading times are already a a deanonymization vector/fingerprinting. Way back in the day IE made a big deal about bundling a group of fonts and calling them "Web Safe" and we need an initiative like that again. Take the Top X fonts from Google Fonts and just bundle them with browsers or operating systems and spread those as widely as possible so that it isn't a useful deanonymization vector.

[+] antihero|3 years ago|reply
Ah yes, we should entirely limit typographic creativity due to creating slight irritation for the lowest common denominator. Guess we should stick to web safe colours and 90% JPEGs too.

Self-hosted web fonts are fine.

[+] oliwarner|3 years ago|reply
I've made a few websites. This idea might work for your blog but it doesn't swing with designers. Brand control is everything, and that extends well into typography.

Given the web is about 65% browsed by a Google client, I'm surprised Google Font aren't directly installed into each system fonts on first load.

[+] AtNightWeCode|3 years ago|reply
What is even more annoying with fonts.googleapis.com is that Android has such a poor set of default fonts.
[+] xwdv|3 years ago|reply
It’s not the developers fault it’s the god damned designers who insist we need custom fonts for everything. These people have no technical sense.
[+] littlecranky67|3 years ago|reply
An often overlooked issue is TCP congestion window scaling; For HTTP/1.1 it can be beneficial to serve all resources from a single host as you will likely have "hot" (=larger scaled congestion window) tcp connections to that host. In contrast, if you serve resources from 3-4 different hosts, the TCP congestion windows needs to scale up for each inididual host, which takes a couple of round trips before maximum throughput is achieved. However, there is no silver bullet here, whether or not this is faster or slower depends on a myriad of connection/link parameters.
[+] jerf|3 years ago|reply
This also strikes me as another "You're Not Google" example. With modern computers, where even the smallest instance currently available on AWS would have been a beefy system when this advice was first diffused into the community, CDNs are probably solving a problem you don't have in 2022. If you have 10MB of static files to serve, and they're decently cached, you're looking at the ability to serve on the order of 50-100 new users per second with very realistic setups nowadays that would still be generally considered "entry level" in the modern environment, while in the meantime consuming almost no resources on the server because serving static files is really easy. And that's just straight off the "main server".

If you've run the numbers, and that's a problem, hey, great, more power to you! There's web sites where that's a problem. But, you know, remember that 100 new users per sec is a scale of nearly 10,000,000 per day. Is that really your scale? There's an awful lot of sites, even busy, productive, and profitable sites, that are looking more at the "1 new user loading a fresh copy of these things per second", if not one per minute.

I remember working with "servers" in the low hundreds of megahertz, on software stacks a lot less optimized than today. Shifting around your static serving could do something then. Today? Choking even a single CPU server's capability to serve static content takes some doing. By the time that's your core problem, you'll know.

(There's still a lot of "best practices" banging around the community from the "low hundreds of megahertz" period of time. At the risk of goring some sacred oxes, I also consider the obsession with total statelessness to date from this era. While one must still be careful with it, judicious application of state in the modern era can be very helpful. It's less panic-inducingly terrifying when I can afford the equivalent of entire servers dedicated to each individual user, where "server" is defined as "a server class machine from the time this advice first percolated out". The software engineering considerations around state are relevant and must be considered, but the solution of "zero! none! never! not any!" is no longer the only viable or best choice.)

[+] toast0|3 years ago|reply
> If you have 10MB of static files to serve, and they're decently cached, you're looking at the ability to serve on the order of 50-100 new users per second with very realistic setups nowadays that would still be generally considered "entry level" in the modern environment, while in the meantime consuming almost no resources on the server because serving static files is really easy. And that's just straight off the "main server".

If your web page needs 10MB of static files to load, you're going to need a lot of round trips to get that to your users. Your server and clients may be on 10G ethernet, but that 10MB download is still going to be limited by 'slow start' congestion control unless the round trip time is low. That's why people want to use CDNs more than reducing resource use of serving static files (which is pretty low as you mention).

[+] sleepydog|3 years ago|reply
While I agree we should not assume a cdn is necessary, as far as user traffic is concerned, not all seconds are created equal. Those pesky users have a habit of making requests at the same time, like they're in cahoots with one another. Whether it's because they're all in the same time zone and they just got home from work, or the clock struck midnight and your fire sale has begun, you don't provision for average qps, you provision for peak qps.

This does not negate your argument, in fact for many use cases doing your own caching could be a better way to meet increased demand (certainly a cheaper way). CDN-related issues can be incredibly difficult to troubleshoot, too, especially without a support contract.

[+] marcosdumay|3 years ago|reply
> I also consider the obsession with total statelessness to date from this era

People were less obsessed with statelessness back then. There was some push for statelessness (peaking at the time of the 10K problem), but it was much less than now.

I believe the current wave is entirely caused by the publicity of the "we rent IaS, but not as commodity computers" cloud, and doesn't have any technical origin.

[+] jjav|3 years ago|reply
> With modern computers, where even the smallest instance currently available on AWS would have been a beefy system when this advice was first diffused into the community

Cloud VM instances are for the most part very underpowered (and for the performance, overpriced). You'd have to go far back a lot of years to get to an era where a "beefy system" of the time was as weak as the smallest AWS instance (t2.nano?)

[+] PaulHoule|3 years ago|reply
I have never been sure that public CDNs were a win, even back in the day.

The issue is this: many sites would download files from 10+ different domains and a single ‘long-tail’ DNS query, increasingly likely if you have more domains, will cause more delay than you could possibly save by using multiple servers.

[+] Gigachad|3 years ago|reply
There was common knowledge that cdns were great because they would have predownloaded copies on the users cache from other sites. But one group tested it and found that it’s so infrequent that the user has the exact version of jquery from the same cdn that you may as well ignore the chance.
[+] MattIPv4|3 years ago|reply
Two of the things that this article calls out are security & privacy. It's worth keeping in mind that you can and should be using SRI when loading remote resources like this, which will go a long way to protect you. And, you can set attributes like referrerpolicy to ensure that privacy is maintained as much as possible.

(cdnjs, that I maintain and is referenced in the article, does both of these by default if you copy a script/link tag from our site.)

[+] wccrawford|3 years ago|reply
Like many other foot-guns, this is still great for novices to get started and learn the ropes without dealing with a lot of production-level problems. In an ideal world, everyone would learn best practices from the start and never create a bug in their code.

But that isn't how humans work, and there needs to be some grease easing the process at the start. So they still serve a function, even if it's not the one that the CDNs imagined at the start.

[+] _nhynes|3 years ago|reply
The comment on IPFS is a bit idealistic.

> There is no single service that can go down [...]

> Each piece of content is loaded entirely independently, [...]

Both of these statements can be true, but in fact, what ends up happening is someone puts their site on a pinning service like Piñata or Infura, and then it's right back to being centralized and trackable. Filecoin helps decentralize who pins, but even then, one ends up using a pretty interface like web3.storage since using the real service is just too much overhead for just pushing a page.

I'm personally a big fan of IPFS, but I'd use it to make an entire site reliable for users who care to self-pin.

[+] oehpr|3 years ago|reply
>and then it's right back to being centralized and trackable.

I haven't been following IPFS very closely recently, but this doesn't track for me at all.

So people pin some content on a big node.

Once you download it, now it's on YOUR node, and your friend can still get it off you. Your node is just as authoritative as the big one. They disappear and IPFS can carry on.

I remember people criticizing NFT's using IPFS links as "not solving anything" and I was truly truly baffled at why anyone would think that. You could easily hold on to your image on a thumb drive, or absolutely any method of backing content up. If every single copy on the internet got nuked, you could just publish from your usb drive again and suddenly it's back up, available for all to see. More to the point, anyone could do that. So anyone (including you) with even a slight motivation to preserve that data could authoritatively do so.

disclaimer: I do not like NFT's. I don't like them practically, I don't like them abstractly, and I don't like them culturally. It's just this is one criticism thrown at NFTs I don't agree with, IPFS makes this aspect work.

[+] skrebbel|3 years ago|reply
Public CDNs are super handy when you're quickly whipping something up. I feel like the DX is under-appreciated in discussions about this.

Eg skypack is just awesome, it lets you import any npm package onto a website as if it was a proper es6 module. Google fonts is amazing in similar ways. Dangerous, fine, but I empathically disagree with calling that "useless".

[+] cbg0|3 years ago|reply
Overly sensationalized title, like the author points out as well in the article. Your site will be fast and your users will be safe as long as you use SRI with any content you're adding from a public CDN.
[+] zinekeller|3 years ago|reply
Huh? The advice is clearly written as: if it's beneficial compile your JS and if not host the libraries yourself (especially if you're using a CDN for your own website anyways).
[+] outloudvi|3 years ago|reply
1. Anything beyond control may cause problems. For the security part: add SRI to whatever you care about, please. 2. Could we, in 2022, get rid of the troublesome Referer?
[+] i5heu|3 years ago|reply
Well I like CloudFlare Pages a lot.

Very much performance for the time invested.

[+] ctur|3 years ago|reply
This article has a lot of good points and suggestions for site creators, but for consumers, there is something you can use: the Decentraleyes browser addon (https://decentraleyes.org/). It basically intercepts calls to well-known centralized CDNs and serves it locally from the browser without issuing any network calls whatsoever.
[+] Hizonner|3 years ago|reply
"Host your dependencies yourself"? How about "don't put so much goddamned bloat in your Web pages"?
[+] patates|3 years ago|reply
Are you suggesting that all dependencies are bloat or am I misinterpreting your comment?
[+] elashri|3 years ago|reply
I agree, most website can get rid of many JS files and will not make it less functional.
[+] billpg|3 years ago|reply
"...cached content is no longer shared between domains. This is known as cache partitioning..."

But why?

[+] red_trumpet|3 years ago|reply
Iirc, privacy considerations. A website could detect a cache hit/miss by measuring the time until the resource becomes available.
[+] Gigachad|3 years ago|reply
It was some privacy issue where you could determine visited sites by timing the download iirc.
[+] Avamander|3 years ago|reply
You could infer browsing habits from that using timing data.
[+] Neil44|3 years ago|reply
Most websites and web apps spend so much time on server side CPU per page load that offloading a few static requests makes next to no difference. The CDN thing is way down the list of changes you can make to improve performance in most cases.
[+] weird-eye-issue|3 years ago|reply
Not true especially for CSS and JS required for the page to even render
[+] xigoi|3 years ago|reply
I'm developing a markup language that compiles to HTML and (among other things) allows easily inserting LaTeX equations. Currently they're rendered using KaTeX, which requires custom fonts. Is there a way to make it work without a CDN? I don't want to sacrifice the convenience of being able to quickly write down some math and having it compiled into a single HTML file, so requiring the user to put the fonts somewhere and link them separately is unfortunately not an option.
[+] vaylian|3 years ago|reply
> When talking about caches, I'm primarily suggesting a paid caching reverse-proxy service, like Cloudflare, Fastly, Cloudfront, Akamai, etc

I'm confused. Where is the difference between using these companies as CDN providers versus using them as cache providers? The only minor difference that I can see is that my server would provide the original copy for every cached item, but this does not mitigate the risk of cache poisoning and the privacy risks that you also encounter with CDNs.

[+] jshawl|3 years ago|reply
>Meanwhile, if your caching reverse-proxy goes down, you have the option to immediately put a different caching service in front of your site, or temporarily serve static content from your servers directly, and get things working again with no code changes or backend deployments required

I think I'd prefer deploying a new CDN link to updating DNS. "No code changes or backend deployments" makes the assumption that DNS is a manual change (not terraform, etc.)

[+] advisedwang|3 years ago|reply
Interestingly the Google Hosted Libraries ToS[1] says:

> Our systems are designed to remove HTTP referer information before logging, to avoid associating requests with any individual website using the Google Hosted Libraries.

So they at least are claiming the CDN is not used to snoop on traffic paterns.

[1] https://developers.google.com/speed/libraries/terms

[+] ARandomerDude|3 years ago|reply
> Most importantly: cached content is no longer shared between domains. This is known as cache partitioning and has been the default in Chrome since October 2020...

TIL.