top | item 45965293

(no title)

lordofgibbons | 3 months ago

How did we get to a place where either Cloudflare or AWS having an outage means a large part of the web going down? This centralization is very worrying.

discuss

order

afavour|3 months ago

Because no one cares enough, including users.

Oddly this centralization allows a complete deferral of blame without you even doing anything: if you’re down, that’s bad. But if you’re down, Spotify is down, social media is down… then “the internet is broken” and you don’t look so bad.

It also reduces your incentive to change, if “the internet is down” people will put down their device and do something else. Even if your web site is up they’ll assume it isn’t.

I’m not saying this is a good thing but I’m simply being realistic about why we ended up where we are.

marticode|3 months ago

As a user I do care, because I waste so much time on Cloudflare's "prove you are human" blocking-page (why do I have to prove it over and over again?), and frequently run on websites blocking me entirely based on some bad IP-blacklist used along with Cloudflare.

ocdtrekkie|3 months ago

This is essentially the entire IT excuse for going to anything cloud. I see IT engineers all the time justifying that the downtime stops being their problem and they stop being to blame for it. There's zero personal responsibility in trying to preserve service, because it isn't "their problem" anymore. Anyone who thinks the cloud makes service more reliable is absolutely kidding themselves, because everyone who made the decision to go that way already knows it isn't true, it just won't be their problem to fix it.

If anyone in the industry actually cared about reliability and took personal stake in their system being up, everyone would be back on-prem.

tjoff|3 months ago

Users have no options because... everything has been centralized. So it doesn't matter if users care or not.

Users are never a consideration today anyway.

alentred|3 months ago

There is an upside too. Us humans, we also need our down time occasionally.

baxtr|3 months ago

Who cares if a couple of websites are down a day or even two?

As long as HN is up and running, everything is going to be O.K.!

thr0w|3 months ago

> But if you’re down, Spotify is down, social media is down… then “the internet is broken” and you don’t look so bad.

In my direct experience, this isn't true if you're running something even vaguely mission-critical for your customers. Your customer's workers just know that they can't do their job for the day, and your customer's management just knows that the solution they shepherded through their organization is failing.

BeFlatXIII|3 months ago

> if “the internet is down” people will put down their device and do something else

In this case, the internet should be down more often.

falcor84|3 months ago

100% this. While in my professional capacity I'm all in for reliability and redundancy, as an individual, I quite like these situations when it's obvious that I won't be getting any work done and it's out of my control, so I can go run some errands to or read a book, or just finish early.

tjwebbnorfolk|3 months ago

> if “the internet is down” people will put down their device and do something else.

oh no

jclardy|3 months ago

Which "user" are you referring to? Cloudflare users or end product users?

End product users have no power, they can complain to support and maybe get a free month of service, but the 0.1% of customers that do that aren't going to turn the tide and have anything change.

Engineering teams using these services also get "covered" by them - they can finger point and say "everyone else was down too."

lxgr|3 months ago

Many people care, but none of them can (sufficiently) change the underlying incentive structure to effect the necessary changes.

pancsta|3 months ago

> if you’re down, Spotify is down, social media is down… then “the internet is broken” and you don’t look so bad.

Which changes nothing to you actually being down, youre only down more. CF proxies always sucked - not your domain, not your domain...

timeon|3 months ago

But Spotify was not down. One social media was down.

This:

> if you’re down, that’s bad. But if you’re down, Spotify is down, social media is down… then “the internet is broken” and you don’t look so bad.

is just marketing. If you are down with some other websites it is still bad.

LtWorf|3 months ago

> Because no one cares enough, including users.

When have users been asked about anything?

ozgrakkurt|3 months ago

On the other hand, it is cool to be up when the internet is down

delfinom|3 months ago

Eh? It's because they are offering a service too good to refuse.

The internet this day is fucking dangerous and murderous as hell. We need Cloudflare just to keep services up due to the deluge of AI data scrapers and other garbage.

mistrial9|3 months ago

> Because no one cares enough, including users.

this is like a bad motivational speaker talk.. heavy exhortations with a dramatic lack of actual reasoning.

Systems are difficult, people. It is "incentives" of parties and lockin by tech design and vendors, not lack of individual effort.

ge96|3 months ago

Also it's free (the basic domain protection offered by CF anyway)

PunchyHamster|3 months ago

More like "don't have choice". It's not like service provider gonna go to competition, because before you switch, it will be back.

Frankly it's a blessing, always being able to blame the cloud that management forced company to migrate to be "cheaper" (which half of the time turns out to be false anyway)

Hrun0|3 months ago

> It also reduces your incentive to change, if “the internet is down” people will put down their device and do something else. Even if your web site is up they’ll assume it isn’t.

I agree. When people talk about the enshittification of the internet, Cloudflare plays a significant role.

martinald|3 months ago

Many reasons but DDoS protection has massive network effects. The more customers you have (and therefore bandwidth provision) the easier it is to hold up against a DDoS, as DDoS are targeting just one (usually) customer.

So there are massive economies of scale. Small CDN with (say) 10,000 customers and 10mbit/sec per customer can handle 100gbit/s DDoS (way too simplistic, but hopefully you get the idea) - way too small.

If you have the same traffic provisioned on average per customer and have 1 million customers, you can handle a DDoS 100x the size.

Only way to compete with this is to massively overprovision bandwidth per customer (which is expensive, as those customers won't pay more just for you to have more redundancy because you are smaller).

In a way (like many things in infrastructure) CDNs are natural monopolies. The bigger you get -> the more bandwidth and PoP you can have -> more attractive to more customers (this repeats over and over).

It was probably very astute of Cloudflare to realise that offering such a generous free plan was a key step in this.

kordlessagain|3 months ago

Your argument is technically flawed.

In a CDN, customers consume bandwidth; they do not contribute it. If Cloudflare adds 1 million free customers, they do not magically acquire 1 million extra pipes to the internet backbone. They acquire 1 million new liabilities that require more infrastructure investment.

All you are doing is echoing their pitch book. Of course they want to skim their share of the pie.

codedokode|3 months ago

In my opinion, DDoS is possible only because there is no network protocol for a host to control traffic filtering on upstream providers (deny traffic from certain subnets or countries). In this case everybody would prefer write their own systems rather than rely on a harmful monopoly.

karmelapple|3 months ago

And how many companies want to also be able to build out their own CDN?

Not every company can be an expert at everything.

But perhaps many of us could buy a different CDN than the major players if we want to reduce the likelihood of mass outages like this though.

ulrikrasmussen|3 months ago

Yeah, I went to HN after the third web page didn't work. I am not just worried about the single point of failure, I am much more worried about this centralization eventually shaping the future standards of the web and making it de facto impossible to self-host anything.

Well that and the fact that when 99% goes through a central party, then that central party will be very interesting for authoritarian governments to apply sweeping censorship rules to.

sankalpmukim|3 months ago

It is already nearly impossible/very expensive in my country to be able to get a public IP address (Even IPv6) which you could host on. World is heavily moving towards centrally dependant on these big Cloud providers.

popcorncowboy|3 months ago

> eventually shaping the future standards of the web and making it de facto impossible to self-host anything

Eventually?

GuB-42|3 months ago

Another one that worries me is Let's Encrypt.

It is not as bad as Cloudflare or AWS because certificates will not expire the instant there is an outage, but considers that:

- It serves about 2/3 of all websites

- TLS is becoming more and more critical over time. If certificates fail, the web may as well be down

- Certificate lifetimes are becoming shorter and shorter, now 90 days, but Let's Encrypt is now considering 6 days, with 47 days being planned as a minimum

- An outage is one thing, but should a compromise happen, that would be even more catastrophic

Let's Encrypt is a good guy now, but remember that Google used to be a good guy in the 2000s too!

phasmantistes|3 months ago

(Disclaimer: I am tech lead of Let's Encrypt software engineering)

I'm also concerned about LE being a single point of failure for the internet! I really wish there were other free and open CAs out there. Our goal is to encrypt the web, not to perpetuate ourselves.

That said, I'm not sure the line of reasoning here really holds up? There's a big difference between this three-hour outage and the multi-day outage that would be necessary to prevent certificate renewal, even with 6-day certs. And there's an even bigger difference between this sort of network disruption and the kind of compromise that would be necessary to take LE out permanently.

So while yes, I share your fear about the internet-wide impact of total Let's Encrypt collapse, I don't think that these situations are particularly analogous.

seniorThrowaway|3 months ago

Agree, I’ve thought about this one too. The history of SSL/TLS certs is pretty hacky anyway in my opinion. The main problem they are solving really should have been solved at the network layer with ubiquitous IPsec and key distribution via DNS since most users just blindly trust whatever root CAs ship with their browser or OS, and the ecosystem has been full of implementation and operational issues.

Let’s Encrypt is great at making the existing system less painful, and there are a few alternatives like ZeroSSL, but all of this automation is basically a pile of workarounds on top of a fundamentally inappropriate design.

b00ty4breakfast|3 months ago

Google was always a for-profit operation. Let's Encrypt/ISRG could still go rotten but there are less incentives for them to do so as a non-profit.

pixel_popping|3 months ago

Mostly since the AWS craze started a decade ago, developers have gone away from Dedicated servers (which are actually cheaper, go figure), which is causing all this mess.

It's genuinely insane that many companies are designing a great amount of fallbacks... on the software level but almost none is thought on the hardware/infrastructure level, common-sense dictate that you should never host everything on a single provider.

geerlingguy|3 months ago

I tried as hard as I could to stay self hosted (and my backend is, still), but getting constant DDoS attacks and not having the time to deal with fighting them 2-3x a month was what ultimately forced me to Cloudflare. It's still worse than before even with their layers of protection, and now I get to watch my site be down a while, with no ability to switch DNS to point back to my own proxy layer, since CF is down :/

imglorp|3 months ago

With the state of constant attack from AI scrapers and DDOS bots, you pretty much need to have a CDN from someone now, if you have a serious business service. The poor guys with single prem boxes with static HTML can /maybe/ weather some of this storm alone but not everything.

elondaits|3 months ago

I self hosted on one of the company’s servers back in the late 90s. Hard drive crashes (and a hack once, through an Apache bug) had our services (http, pop, smtp, nfs, smb, etc ) down for at least 2-3 days (full reinstall, reconfiguration, etc).

Then, with regular VPSs I also had systems down for 1-2 days. Just last week the company that hosts NextCloud for us was down the whole weekend (from Friday evening) and we couldn’t get their attention until Monday.

So far these huge outages that last 2-5 hours are still lower impact for me, and require me to take less action.

MattSayar|3 months ago

I like the idea of having my own rack in a data center somewhere (or sharing the rack, whatever) but even a tiny cost is still more than free. And even then, that data center will also have outages, with none of the benefits of a Cloudflare Pages, GitHub Pages, etc.

nzach|3 months ago

> developers have gone away from Dedicated servers (which are actually cheaper, go figure)

It depends on how you calculate your cost. If you only include the physical infrastructure having a dedicated server is cheaper. But by having some dedicated server you loose a lot of flexibility. Needs more resources? Just scale up your ec2, and with a dedicated server there is a lot more work involved.

Do you want a 'production-ready' database? With AWS you can just click a few buttons and have a RDS ready to use. To roll out your own PG installation you need someone with a lot of knowledge(how to configure replication? backups? updates? ...).

So if you include salaries in the calculation the result changes a lot. And even if you already have some experts in your payroll by putting them to work in deploying a PG instance you won't be able to use them to build other things that may generate more value to you business than the premium you pay to AWS.

slightwinder|3 months ago

Cloud-Hoster are that hardware-fallback. They started with offering better redundancy and scaling than your homemade breadbox. But it seems they lost something along the way and now we have this.

powerpixel|3 months ago

Maintainance cost is the main issue for on-prem infra, nowadays add things like DDOS protection and/or scraping protection, which can require dedicated team or for your company to rely on some library or open source project that is not guaranteed to be maintained forever (unless you give them support, which i believe in)... Yeah I can understand why companies shift off of on-prem nowadays

PaulHoule|3 months ago

... dedis are cheaper if you are rightsized. If you are wrongsize they just plain crash and you may or may not be able to afford the upgrade.

I was at Softlayer before I was at AWS and what catalyzed the move was the time I needed to add another hard drive to a system and somehow they screwed it up. I couldn't put a trouble ticket it to get it fixed because my database record in their trouble ticket system was corrupted. The next day I moved my stuff to AWS and the day after that they had a top sales guy talk to me to try to get me to stay but it was too late.

lforster|3 months ago

They're using cloudfare for multicloud, but still have cloudfare as a single point of failure. Should make a cloudfare for cloudfare to solve this.

nexttk|3 months ago

Like the infamous "smiling through the pain" meme:

"I added a load-balancer to improve system reliability" (happy)

"Load balancer crashed" (smiling-through-the-pain)

amalcon|3 months ago

You jest, but this actually does exist. Multiple CDNs sell multi-CDN load balancing (divide traffic between 2+ CDNs per variously-complicated specifications, with failover) as a value add feature, and IIRC there is at least one company for which this is the marquee feature. It's also relatively doable in-house as these things go.

MichaelZuo|3 months ago

If there’s clearly a single point of failure shouldn’t it be called a single cloud pretending to be “multicloud”?

sotix|3 months ago

This might sound crazy as a software engineer, but I actually like the occasional "snow day" where everything goes down. It's healthy for us to all disconnect from the internet for a bit. The centralization unintentionally helps facilitate that. At least, that's my glass half full perspective.

gspencley|3 months ago

I can understand that sentiment. Just don't lose sight of the impact it can have on every day people. My wife and I own a small theatre and we sell tickets through Eventbrite. It's not my full time job but it is hers. Eventbrite sent out an email this morning letting us know that they are impacted by the outage. Our event page appears to be working but I do wonder if it's impacting ticket sales for this weekend's shows.

So while us in tech might like a "snow day", there are millions of small businesses and people trying to go about their day to day lives who get cut off because of someone else's fuck-ups when this happens.

cultofmetatron|3 months ago

> This might sound crazy as a software engineer, but I actually like the occasional "snow day" where everything goes down

As as software engineer, I get it. as a CTO, I spent this morning triaging with my devops ai(actual Indian) to find some workaround (we found one) while our CEO was doing damage control with customers (non technical field) who were angry that we were down and they were losing business by the minute.

sometimes I miss not having a direct stake in the success of the business.

hashim|3 months ago

I'm guessing you're employed and your salary is guaranteed regardless. Would you have the same outlook if you were the self-employed founder of an online business and every minute of outage was costing you money?

ljm|3 months ago

If the internet was just social media, SaaS productivity suites, and AI slop, sure...

But there are systems that depend on Cloudflare, directly or not, and when they go down it can have a serious impact on somebody's livelihood.

majani|3 months ago

Now that network effects and data lock-in have taken root, downtime is not as big of a concern as it was in the 2000s

amw-zero|3 months ago

What does this even mean? Because people have locked in their data, they’re ok with downtime? I can’t imagine a world where this is true.

swyx|3 months ago

except, yknow, where peoples lives and livelihoods depend on access to information/being able to do things on exact time. aws and cloudflare are disqualifying themselves from hospitals and military and whatnot.

mobiuscog|3 months ago

How did we get to a place where Cloudflare being down means we see an outage page, but on that page it tells us explicitly that the host we're trying to connect to is up, and it's just a Cloudflare problem.

If it can tell us that the host is up, surely it can just bypass itself to route traffic.

ralferoo|3 months ago

"... surely it can just ..."

Congratulations, you've successfully completed Management Training 101.

ec109685|3 months ago

Totally cooked if you have Cloudflare fronting us-east-1, with no redundancies.

lbreakjai|3 months ago

It could be worse. You could have a backup on Azure.

tacker2000|3 months ago

The mother of all bad infra decisions.

a012|3 months ago

They have multi cloud infra, between us-east-1 and Azure

giantrobot|3 months ago

People use CloudFlare because it's a "free" way for most sites to not get exploited (WAF) or DDoSed (CDN/proxy) regularly. A DDoS can cost quite a bit more than a day of downtime, even just a thundering herd of legitimate users can explode an egress bill.

It sucks there's not more competition in this space but CloudFlare isn't widely used for no reason.

AWS also solves real problems people have. Maintaining infrastructure is expensive as is hardware service and maintenance. Redundancy is even harder and more expensive. You can run a fairly inexpensive and performant system on AWS for years for the cost of a single co-located server.

seydor|3 months ago

Slowly and with full conscience of where we were heading to.

neop1x|3 months ago

It's not only centralization in the sense your website will be down if they are down but it is also a centralized MITM proxy. If you transfer sensitive data like chats over cloudflare-"protected" endpoints, you also allow CF to transparently read and analyze it in plain-text. It must be very easy for state agencies to spy on the internet nowadays, they woukd just ask CF to redirect traffic to them.

alkonaut|3 months ago

Because it's better to have a really convenient and cheap service that works 99% of the time, than a resilient that is more expensive or more cumbersome to use.

It's like github vs whatever else you can do with git that is truly decentralized. The centralization has such massive benefits that I'm very happy to pay the price of "when it's down I can't work".

kahrl|3 months ago

When there is an accident on the interstate we should blame the centralization of traffic and advocate for no more highways.

Very worrying indeed.

rglover|3 months ago

Most developers don't care to know how the underlying infrastructure works (or why) and so they take whatever the public consensus is re: infra as a statement of fact (for the better part of the last 15 years or so that was "just use the cloud"). A shocking amount of technical decisions are socially, not technically enforced.

bilekas|3 months ago

This topic is raised every time there is an outage with cloudflare and the truth of the matter is, they offer an incredible service, there is not a bit enough competition to deal with it. By definition their services are so good BECAUSE their adoption rate is so high.

It's very frustrating of course, and it's the nature of the beast.

blazinglyfast|3 months ago

False dichotomy. Both can be true.

bikamonki|3 months ago

Compliance. If you wanna sell your SAAS to big corpo, their compliance teams will feel you know what you're doing if they read AWS or Cloudflare on your architecture, even if you do not quite know what you're doing.

phendrenad2|3 months ago

Because DDoS is a fact of life (and even if you aren't targeted by DDoS, the bot traffic probing you to see if you can be made part of the botnet is enough to take down a cheap $5 VPS). So we have to ask - why? Personally, I don't accept the hand-wavy explanation that botnets are "just a bunch of hacked IoT devices". No, your smart lightbulb isn't taking down Reddit. I slightly believe the secondary explanation that it's a bunch of hacked home routers. We know that home routers are full of things like suspicious oopsie definitely-not-government backdoors.

drob518|3 months ago

IMO, centralization is inevitable because the fundamental forces drive things in that direction. Clouds are useful for a variety of reasons (technical, time to market, economic), so developers want to use them. But clouds are expensive to build and operate, so there are only a few organizations with the budget and competency to do it well. So, as the market matures you end up with 3 to 5 major cloud operators per region, with another handful of smaller specialists. And that’s just the way it works. Fighting against that is to completely swim upstream with every market force in opposition.

gist|3 months ago

There is this tendency to phrase questions (or statements) as "when did 'we' ".

These decision are made individually not centrally. There is no process in place (and most likely there will never be) that will be able to control and dictate if people decide one way of doing things is the best way to do it. Even assuming they understand everything or know of the pitfalls.

Even if you can control individually what you do for the site you operate (or are involved in) you won't have any control on parts of your site (or business) that you rely on where others use AWS or Cloudflare.

ljm|3 months ago

I would be less worried if Cloudflare and AWS weren't involved in many more things than simply running DNS.

AWS - someone touches DynamoDB and it kills the DNS.

Cloudflare - someone touches functionality completely unrelated to DNS hosting and proxying and, naturally, it kills the DNS.

There is this critical infrastructure that just becomes one small part of a wider product offering, worked on by many hands, and this critical infrastructure gets taken down by what is essentially a side-effect.

It's a strong argument to move to providers that just do one thing and do it well.

exasperaited|3 months ago

Re: Cloudflare it is because developers actively pushed "just use Cloudflare" again and again and again.

It has been dead to me since the SSL cache vulnerability thing and the arrogance with which senior people expected others to solve their problems.

But consider how many people still do stupid things like use the default CDN offered by some third party library, or use google fonts directly; people are lazy and don't care.

abtinf|3 months ago

Because they are great services, are generally pretty easy to get started with, and usually work as expected, which has led to broad adoption.

telepromptereye|3 months ago

We take the idea of the internet always being on for granted. Most people don’t understand the stack and assume that when sites go down it’s isolated, and although I agree with you, it’s just as much complacency and lack of oversight and enforcement delays in bureaucracy as it is centralization. But I guess that’s kind of the umbrella to those things… lol

an-allen|3 months ago

Well the centralisation without rapid recovery and practices that provide substantial resiliency… that would be worrying.

But I dare say the folks at these organisations take these matters incredibly seriously and the centralisation problem is largely one of risk efficiency.

I think there is no excuse, however, to not have multi region on state, and pilot light architectures just in case.

cj|3 months ago

Except businesses love it.

A lot (and I mean a lot) of people in IT like centralization specifically because it’s hard to blame people for doing something that everyone else is doing.

iso1631|3 months ago

And HN users love it too. I've had people on this site say how great it is that their system routes 30% of traffic on the internet.

I'd be horrified. That's not the internet or computing industries I grew up with, or started working in.

But as long as the SPY keeps hitting > 10% returns each year, everyone's happy.

chb|3 months ago

"No one gets fired for buying IBM!"

mvkel|3 months ago

This was always the case. There was always a "us-east" in some capacity, under Equinix, etc. Except it used to be the only "zone," which is why the internet is still so brittle despite having multiple zones. People need to build out support for different zones. Old habits die hard, I guess.

bsoles|3 months ago

> How did we get to a place where either Cloudflare or AWS having an outage means a large part of the web going down?

As always, in the name of "security". When are we going to learn that anything done, either by the government or by a corporation, in the name of security is always bad for the average person?

butlike|3 months ago

It's weird to think about so bear with me. I don't mean this sardonically or misanthropically. But, it's "just the internet." It's just the internet. It dones't REALLY matter in a large enough macro view. It's JUST the internet.

rcarmo|3 months ago

What is worrying is that distributed systems don’t seem to be that distributed in practice.

expedition32|3 months ago

Designed to survive a first strike from the USSR. Taken down by Cloudflare.

hhthrowaway1230|3 months ago

Don't think there is anything wrong with a centralised service being down, you just make a conscious decision if you want that and can afford that?

People not being ready for cloudflare/[insert hyperscaler] to be possibly down is the only fault.

Lammy|3 months ago

It's because single points of traffic concentration are the most surveillable architecture, so FVEY et al economically reward with one hand those companies who would build the architecture they want to surveil with the other hand.

kilpikaarna|3 months ago

Currently at the public library and I can't use the customer inventory terminals to search for books. They're just a web browser interface to the public facing website, and it's hosted behind CF. Bananas.

gmiller123456|3 months ago

Don't forget the CloudStrike outage: One company had a bug that brought down almost everything. Who would have thought there are so many single points of failure across the entire Internet.

poemxo|3 months ago

For most services it's safer to host from behind Cloudflare, and Cloudflare is considered more highly available than a single IaaS or PaaS, at least in my headcanon.

bawolff|3 months ago

The same reason we have centralization across the economy. Economies of scale is how you make a big business succesful, and once you are on top its hard to dislodge you.

chasing0entropy|3 months ago

Agreed. More worrying is that it appears standard practice or separation between domain and nameserver administration has been lost to one-stop-shop marketing.

strict9|3 months ago

And all of these outages happening not long after most of them dismissed a large amount of experienced staff while moving jobs offshore to save in labor costs.

ridgeguy|3 months ago

Short-term economic forces, probably. Centralization is often cheaper in the near term. The cost of designing in single-point failure modes gets paid later.

kordlessagain|3 months ago

The technical term for it is a man in the middle. It’s better to call it what it is that way you aren’t fooled into thinking it’s not, because it is.

paulddraper|3 months ago

Because bots are a real thing.

And it’s hard to protect against DDoS without something like Cloudflare.

Look at the posts here.

Even the meager HN “hug of death” will take things down

peacebeard|3 months ago

A lot of products use AWS because "we could build redundancy and multi-region if we need it" and then never build it.

rtkwe|3 months ago

I think some of the issues in the last outage actually affected multiple regions. IIRC internally some critical infrastructure for AWS depends on us-east-1 or at least it failed in a way that didn't allow failover.

BurningFrog|3 months ago

How many more of these until governments step in and take over "critical infrastructure"?

nntwozz|3 months ago

Two ways. Gradually, then suddenly.

ronald_petty|3 months ago

Consider joining the Internet Society. An entire group of people who care!

burnt-resistor|3 months ago

A key risk of monopolies is that they lead to monoculture SPoFs.

glitchc|3 months ago

All decentralized systems tend to centralization over time.

ekianjo|3 months ago

because cloudfare protection blah blah, until cloudfare is down itself and then you are back to "who watches the watchmen"

k12sosse|3 months ago

That's easy, the watchmen watchmen watch the watchmen.

baq|3 months ago

because efficiency trumps redundancy in the short term, which is all that matters in a super competitive environment.

joeiq|3 months ago

Is avoiding single point of failure in anyone’s playbook? ¯\_(ツ)_/¯

whstl|3 months ago

We only care about it when it's time to complain about the work of individual people.

Companies can always do as they please and people will rationalize anything.

moralestapia|3 months ago

5 mins. of thought to figure out why these services exist?

Dialogue about mitigations/solutions? Alternative services? High availability strategies?

Nah! It's free to complain.

Me personally, I'd say those companies do a phenomenal job by being a de facto backbone of the modern web. Also Cloudflare, in particular, gives me a lot of things for free.

fithisux|3 months ago

Hacking software or hardware is so old school.

The target these days is the user.

The make-believe worm.

0xbadcafebee|3 months ago

It's not really. People are just very bad at putting the things around them into perspective.

Your power is provided by a power utility company. They usually serve an entire state, if not more than one (there are smaller ones too). That's "centralization" in that it's one company, and if they "go down", so do a lot of businesses. But actually it's not "centralized", in that 1) there are actually many different companies across the country/world, and 2) each company "decentralizes" most of its infrastructure to prevent massive outages.

And yes, power utilities have outages. But usually they are limited in scope and short-lived. They're so limited that most people don't notice when they happen, unless it's a giant weather system. Then if it's a (rare) large enough impact, people will say "we need to reform the power grid!". But later when they've calmed down, they realize that would be difficult to do without making things worse, and this event isn't common.

Large internet service providers like AWS, Cloudflare, etc, are basically internet utilities. Yes they are large, like power utilities. Yes they have outages, like power utilities. But the fact that a lot of the country uses them, isn't any worse than a lot of the country using a particular power company. And unlike the power companies, we're not really that dependent on internet service providers. You can't really change your power company; you can change an internet service provider.

Power didn't used to be as reliable as it is. Everything we have is incredibly new and modern. And as time has passed, we have learned how to deal with failures. Safety and reliability has increased throughout critical industries as we have learned to adapt to failures. But that doesn't mean there won't be failures, or that we can avoid them all.

We also have the freedom to architect our technology to work around outages. All the outages you have heard about recently could be worked around, if the people who built on them had tried:

- CDN goes down? Most people don't absolutely need a CDN. Point your DNS at your origins until the CDN comes back. (And obviously, your DNS provider shouldn't be the same as your CDN...)

- The control plane goes down on dynamic cloud APIs? Enable a "limp mode" that persists existing infrastructure to serve your core needs. You should be able to service most (if not all) of your business needs without constantly calling a control plane.

- An AZ or region goes down? Use your disaster recovery plan: deploy infrastructure-as-code into another region or AZ. Destroy it when the az/region comes back.

...and all of that just to avoid a few hours of downtime per year? It's likely cheaper to just take the downtime. But that doesn't stop people from piling on when things go wrong, questioning whether the existence of a utility is a good idea.

cyanydeez|3 months ago

CAPITALISM

Are people really this confused?