Ironic that their "please don't share our links" post shared on HN also caused their website to crash. It's a 100% static blog post. Use a damn cache, or CDN, or a hundred other ways of handling ~unlimited traffic for free. We are talking about a few thousand hits and tens of megabytes of total data transfer. It isn't 1998. This scale should not be a problem for anyone.
I’m entirely baffled why someone savvy enough to produce monitoring graphs and claim to already use Cloudflare can be brought to knees by HN (maybe Reddit too or something?) traffic on a frigging blog. I believe you need to actively sabotage your Cloudflare settings to achieve this.
Could mastedon federate the link preview as well as the link?
Some OP advice to "get a damn CDN" seems to be a responce rather than a correction of the issue.
Even poorly designed websites shouldn't face DDoS scale access by a social network sharing a link. The social network should mitigate it's massive parallelization.
Maybe it’s time for a new web spec: something that lets federated platforms request link previews with minimal resources. Realistically you need max two requests (one for the structured text, one for the preview media) and those can be heavily cached.
It's not the 15k followers of ItsFOSS who are generating traffic, it is the servers of those 15k accounts own followers who see shares, boosts, or re-shares of content. Given that Mastodon, and the larger Fediverse of which it itself is only a part (though a large share of total activity), are, well, federated (as TFA notes pointedly), sharing links requires each instance to make a request.
I'm not particularly clear on Fediverse and Mastodon internals, but my understanding is that an image preview request is only generated once on a per server basis, regardless of how many local members see that link. But, despite some technical work at this, I don't believe there's yet a widely-implemented way of caching and forwarding such previews (which raises its own issues for authenticity and possible hostile manipulation) amongst instances. (There's a long history of caching proxy systems, with Squid being among the best known and most venerable.) Otherwise, I understand that preview requests are now staggered and triggered on demand (when toots are viewed rather than when created) which should mitigate some, but not all, of the issue.
The phenomenon is known as a Mastodon Stampede, analagous to what was once called the Slashdot Effect.
There's at least one open github issue, #4486, dating to 2017:
And jwz, whose love of HN knows no bounds, discusses it as well. Raw text link for the usual reasons, copy & paste to view without his usual love image (see: <https://news.ycombinator.com/item?id=13342590>).
> Have you considered switching to a Static Site Generator? Write your posts in Markdown, push the change to your Git repo, have GitHub/GitLab automatically republish your website upon push, and end up with a very cacheable website that can be served from any simple Nginx/Apache. In theory this scales a lot better than a CMS-driven website.
Admins Response:
> That would be too much of a hassle. A proper CMS allows us to focus on writing.
Maybe it's because it's deemed "too difficult" to change it?
Years ago, I was looking into some very popular c++ library and wanted to download the archive (a tar.gz or .zip, can't remember). At that time, they hosted it on sourceforge for download.
I was looking for a checksum (md5, sha1 or sha256) and found a mail in their mailing list archive where someone asked for providing said checksums on their page.
The answer? It's too complicated to put creating checksums into the process and sourceforge is safe enough. (paraphrased, but that was the gist of the answer)
That said, since quite some years they provide checksums for their source archives, but they kinda lost me with that answers years ago.
> - why is there a random 'grey.webp' loading from this michelmeyer.whatever domain?
This got me wondering, and the reason is that they embed a "card" for a link to a similar blog post on michaelmeyer.com (and grey.webp is the imagine in the card). There's a little irony there, I think.
Just as a reminder: the site sees plenty of traffic from various other platforms. It's quite popular, I'm sure, Mastodon is one of the least of their concerns for traffic. If they can handle traffic load on many other various viral/popular traffic, the server is capable enough (even without proper caching).
Not every site configuration is perfect, and blaming the site's configuration, and ignoring Mastodon's inherent issue is borderline not practical.
Dear nincompoops: if you trust the original poster and original server to send you text and images in a toot, and federated instances to pass that around without modification... then you can trust them equally to send you a URL and an image preview. It's arrogance and idiocy that lead you to believe you can trust their images but can't trust their web preview images and you have to verify that yourself by having the Fediverse DDoS the host. This problem will only get worse as the Fediverse expands. Fix it now, don't ignore it because it makes a problem for someone else
> if you trust the original poster and original server to send you text and images in a toot, and federated instances to pass that around without modification... then you can trust them equally to send you a URL and an image preview
I don't think it's that straightforward: normally if I follow @[email protected] you and I both need to trust example.com to accurately report what you say. But if you put a link in to news.example and mastodon.example scrapes and includes a preview I now need to trust mastodon.example to accurately report what news.example is saying. And I might well not!
While I agree with other commenters that a server should be able to gracefully handle the relatively small amount of traffic being generated here, I'm sympathetic to the notion that in this specific instance the problem is exacerbated by all the servers making their requests at almost exactly the same instant, i.e. the Thundering Herd Problem https://en.wikipedia.org/wiki/Thundering_herd_problem . If Mastodon wanted to address this, servers could add a small amount of random delay before fetching a link preview.
I noticed this last year when an article I wrote was reposted by the creator of Mastodon. Visits from individual instance crawlers vastly outnumbered visits from people reading the content.
I had prepared the content for potential virality (hand-written HTML & well-optimised images) but it was still an unwelcome surprise when I checked the server logs and saw all that noise.
They can enable a setting that forces users to complete a captcha before accessing the website. That should prevent all the mastodon requests from getting through.
Previews only use the HMTL head - to get OG metadata - and usually an image.
Never thought about this before. One of my sites is (a single-page application) 45k HTML and 40k image. Hence about 50% of what is served for previews is wasted.
It would be nice if there was a way to recognise an "only html head needed" type of request. Don't think there is?
That seems like a problem of It’s Foss’ website, not Mastodon. If you’re hosting a presumably static website, a news website no less, it should be able to handle a spike of viewer influx if you make a viral article. Seems like they’re pinning the blame on Mastodon rather than fixing their site.
It sounds like the author's website/cache/whatever isn't configured properly, but it does sound like a terrible design for the preview data to not be included in the federated data.
Several thousand, yea. That many, unlikely in a single batch. Favouriting and boosting are actions that amplify how far a Mastodon post is going to spread on a “connected instances” spectrum.
But for popular accounts (15k followers as in this case) will definitely spread the link to thousands of instances instantly and cause thousands of individual GET requests.
It’s fun to tail access.log and see it happen in real time.
> Does it mean that posting a link on Mastodon generates 36704 requests to that URL?
No, it's a ratio of bytes for traffic . They're counting size of 1 request to post to Mastodon "a single roughly ~3KB POST", vs the total size of content served from GETing the url in that post.
IDK how valid this metric is, but that's what they're saying.
So how does being linked to on Twitter, Reddit, and Facebook not do the same? Those sites combined clearly would generate an order of magnitude more traffic. Maybe several orders of magnitude more. There has to be more than 15k itsfoss readers between these sites.
A link preview with an image is over 100 MB? That sounds insane. And if they mean the total traffic in 5 minutes was 100MB that cannot possibly be bringing Cloudflare to its knees. That is an indictment of Cloudflare’s CDN then!
I think what’s being implied here is that, when you share a link to Facebook, Facebook will access the page to generate a link preview, so will download a tiny bit of HTML and an image. But when you share a link on mastodon, that link immediately gets propagated to many other mastodon servers, which then propagate it to others, so suddenly many thousands of mastodon instances are simultaneously downloading a little bit of HTML and an image, and the cumulative effect of that in this instance was 100MB over a minute or two.
It does seem like a typical static website ought to not have a problem serving that, especially if it’s behind Cloudflare. It seems odd that a single EC2 instance would have a hard time serving that.
But given more than one person is complaining about, it also seems like each mastodon instance could very easily delay propagation of the story by a few minutes to soften the blow here.
It probably does. The issue is proxy. This is the downside to TLS and letsencrypt. Because man in the middle is a thing. Being a fediverse application the proper idea would be for the clients to P2P the data amongst themselves and lighten the load on the originator. However, in this case a link probably should not do that. So therefore it needs to go get the data itself. It is a DDoS were basically one link can magnify to thousands of other clients. Then until those clients are done they will stomp the site. Perhaps there could be a side channel ask from the client 'hey site are you cool with me showing the user a preview of this'? Or 'hey site is it ok if I share a preview of your page'? That would not totally stop the rush of requests. But would at least keep it from a render of the page for a preview. This method is half baked and something I just came up with and seems wrong though. But auto following links is not exactly in spec either.
Each of those caches the preview/etc. so they hit it once in a while, and that works for all their users. Each mastadon server is independent and can't share that preview cache.
All the issues sound more like a them problem than mastadon or cloudflare.
Something's wrong with their setup. Hard to tell as now HN has brought them to their knees
Well all Twitter links (annoyingly) are made to go through t.co, so either Twitter is generating and serving the link preview directly or it's its shortening service that's being hit, probably for a cached response, not the upstream.
But I agree this does seem strange, like it shouldn't be unmanageable load at all.
Not the best solution for them, but what about remove the metadata from the pages <head> that let Mastodon and other social media platforms request for the preview images? It will hurt discoverability and clicks maybe, but is better than the site having downtimes.
Wasn’t there a TLS extension in draft for verifiable archiving (like DKIM for HTTPS responses). Can’t seem to find it now, but that could help with supporting an authenticated link-preview that doesn’t amplify on reach other servers.
This DDoS'ing problem is due to how the fediverse works. The problem is that federation in general in social networks is extremely inefficient with tons of flaws and this issue one of them.
[+] [-] paxys|1 year ago|reply
[+] [-] oefrha|1 year ago|reply
I’m entirely baffled why someone savvy enough to produce monitoring graphs and claim to already use Cloudflare can be brought to knees by HN (maybe Reddit too or something?) traffic on a frigging blog. I believe you need to actively sabotage your Cloudflare settings to achieve this.
[+] [-] jefftk|1 year ago|reply
1. The "It's Foss" site is designed somewhat carelessly, where many things that could be static aren't, and so it goes under with even a bit of load.
2. Mastodon link preview is badly designed and not spec-compliant. Because the link previews are not triggered by a user request they should respect robots.txt (https://github.com/mastodon/mastodon/issues/21738), or they should start being triggered by a user request (https://github.com/mastodon/mastodon/issues/23662).
[+] [-] johnea|1 year ago|reply
Some OP advice to "get a damn CDN" seems to be a responce rather than a correction of the issue.
Even poorly designed websites shouldn't face DDoS scale access by a social network sharing a link. The social network should mitigate it's massive parallelization.
[+] [-] joshspankit|1 year ago|reply
[+] [-] willcipriano|1 year ago|reply
[+] [-] ehutch79|1 year ago|reply
- They've set the generated page to be uncachable with max-age=0 (https://developers.cloudflare.com/cache/concepts/default-cac...)
- nginx is clearly not caching dynamic resources (currently 22s (not ms) to respond)
- lots of 3rd party assets loading. Why are you loading stripe before I'm giving you money?
- why is there a random 'grey.webp' loading from this michelmeyer.whatever domain?
This isn't mastadon or cloudflare, it's a skill issue.
[+] [-] dredmorbius|1 year ago|reply
I'm not particularly clear on Fediverse and Mastodon internals, but my understanding is that an image preview request is only generated once on a per server basis, regardless of how many local members see that link. But, despite some technical work at this, I don't believe there's yet a widely-implemented way of caching and forwarding such previews (which raises its own issues for authenticity and possible hostile manipulation) amongst instances. (There's a long history of caching proxy systems, with Squid being among the best known and most venerable.) Otherwise, I understand that preview requests are now staggered and triggered on demand (when toots are viewed rather than when created) which should mitigate some, but not all, of the issue.
The phenomenon is known as a Mastodon Stampede, analagous to what was once called the Slashdot Effect.
There's at least one open github issue, #4486, dating to 2017:
<https://github.com/mastodon/mastodon/issues/4486>
Some discussion from 2022:
<https://www.netscout.com/blog/mastodon-stampede>
And jwz, whose love of HN knows no bounds, discusses it as well. Raw text link for the usual reasons, copy & paste to view without his usual love image (see: <https://news.ycombinator.com/item?id=13342590>).
[+] [-] yupyup54133|1 year ago|reply
> Have you considered switching to a Static Site Generator? Write your posts in Markdown, push the change to your Git repo, have GitHub/GitLab automatically republish your website upon push, and end up with a very cacheable website that can be served from any simple Nginx/Apache. In theory this scales a lot better than a CMS-driven website.
Admins Response:
> That would be too much of a hassle. A proper CMS allows us to focus on writing.
[+] [-] rsynnott|1 year ago|reply
[+] [-] SunlitCat|1 year ago|reply
Years ago, I was looking into some very popular c++ library and wanted to download the archive (a tar.gz or .zip, can't remember). At that time, they hosted it on sourceforge for download.
I was looking for a checksum (md5, sha1 or sha256) and found a mail in their mailing list archive where someone asked for providing said checksums on their page.
The answer? It's too complicated to put creating checksums into the process and sourceforge is safe enough. (paraphrased, but that was the gist of the answer)
That said, since quite some years they provide checksums for their source archives, but they kinda lost me with that answers years ago.
[+] [-] wlonkly|1 year ago|reply
This got me wondering, and the reason is that they embed a "card" for a link to a similar blog post on michaelmeyer.com (and grey.webp is the imagine in the card). There's a little irony there, I think.
[+] [-] mongonews|1 year ago|reply
[deleted]
[+] [-] jpifvuk0|1 year ago|reply
Not every site configuration is perfect, and blaming the site's configuration, and ignoring Mastodon's inherent issue is borderline not practical.
[+] [-] Brian_K_White|1 year ago|reply
[+] [-] amiga386|1 year ago|reply
Don't make every Mastodon instance have to fetch the linked page and all its assets to generate its own previews
EDIT: as linked in TFA, it has been nearly 7 years and they're still arguing about it:
* https://github.com/mastodon/mastodon/issues/4486
* https://github.com/mastodon/mastodon/issues/23662
Dear nincompoops: if you trust the original poster and original server to send you text and images in a toot, and federated instances to pass that around without modification... then you can trust them equally to send you a URL and an image preview. It's arrogance and idiocy that lead you to believe you can trust their images but can't trust their web preview images and you have to verify that yourself by having the Fediverse DDoS the host. This problem will only get worse as the Fediverse expands. Fix it now, don't ignore it because it makes a problem for someone else
[+] [-] jefftk|1 year ago|reply
I don't think it's that straightforward: normally if I follow @[email protected] you and I both need to trust example.com to accurately report what you say. But if you put a link in to news.example and mastodon.example scrapes and includes a preview I now need to trust mastodon.example to accurately report what news.example is saying. And I might well not!
I got into this more with mockups here: https://www.jefftk.com/p/mastodons-dubious-crawler-exemption
[+] [-] kibwen|1 year ago|reply
[+] [-] Benedicht|1 year ago|reply
[+] [-] amarant|1 year ago|reply
They must've spelled it wrong...
[+] [-] muglug|1 year ago|reply
I had prepared the content for potential virality (hand-written HTML & well-optimised images) but it was still an unwelcome surprise when I checked the server logs and saw all that noise.
[+] [-] h2odragon|1 year ago|reply
I have a low traffic news site; I wonder If I should share this link or would they prefer not to be troubled by the traffic.
[+] [-] bastawhiz|1 year ago|reply
[+] [-] iainmerrick|1 year ago|reply
If you don't allow caching, a CDN can't help you.
[+] [-] 1231232131231|1 year ago|reply
[+] [-] skilled|1 year ago|reply
[+] [-] 2024throwaway|1 year ago|reply
[+] [-] martinclayton|1 year ago|reply
Never thought about this before. One of my sites is (a single-page application) 45k HTML and 40k image. Hence about 50% of what is served for previews is wasted.
It would be nice if there was a way to recognise an "only html head needed" type of request. Don't think there is?
[+] [-] jakebasile|1 year ago|reply
https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/HE...
[+] [-] ethanholt1|1 year ago|reply
[+] [-] RobotToaster|1 year ago|reply
[+] [-] soneca|1 year ago|reply
There is this quote:
> ” making for a traffic amplification of 36704:1”
Does it mean that posting a link on Mastodon generates 36704 requests to that URL?
[+] [-] skilled|1 year ago|reply
But for popular accounts (15k followers as in this case) will definitely spread the link to thousands of instances instantly and cause thousands of individual GET requests.
It’s fun to tail access.log and see it happen in real time.
[+] [-] SideburnsOfDoom|1 year ago|reply
No, it's a ratio of bytes for traffic . They're counting size of 1 request to post to Mastodon "a single roughly ~3KB POST", vs the total size of content served from GETing the url in that post.
IDK how valid this metric is, but that's what they're saying.
[+] [-] lenerdenator|1 year ago|reply
[+] [-] kemotep|1 year ago|reply
A link preview with an image is over 100 MB? That sounds insane. And if they mean the total traffic in 5 minutes was 100MB that cannot possibly be bringing Cloudflare to its knees. That is an indictment of Cloudflare’s CDN then!
[+] [-] jwalton|1 year ago|reply
I think what’s being implied here is that, when you share a link to Facebook, Facebook will access the page to generate a link preview, so will download a tiny bit of HTML and an image. But when you share a link on mastodon, that link immediately gets propagated to many other mastodon servers, which then propagate it to others, so suddenly many thousands of mastodon instances are simultaneously downloading a little bit of HTML and an image, and the cumulative effect of that in this instance was 100MB over a minute or two.
It does seem like a typical static website ought to not have a problem serving that, especially if it’s behind Cloudflare. It seems odd that a single EC2 instance would have a hard time serving that.
But given more than one person is complaining about, it also seems like each mastodon instance could very easily delay propagation of the story by a few minutes to soften the blow here.
[+] [-] sumtechguy|1 year ago|reply
[+] [-] ehutch79|1 year ago|reply
All the issues sound more like a them problem than mastadon or cloudflare.
Something's wrong with their setup. Hard to tell as now HN has brought them to their knees
[+] [-] OJFord|1 year ago|reply
But I agree this does seem strange, like it shouldn't be unmanageable load at all.
[+] [-] thinkingtoilet|1 year ago|reply
[+] [-] pentagrama|1 year ago|reply
[+] [-] captn3m0|1 year ago|reply
[+] [-] rvz|1 year ago|reply
[+] [-] jononor|1 year ago|reply
[+] [-] zagrebian|1 year ago|reply