top | item 27445762

(no title)

csabakissi | 4 years ago

I can't actually imagine something like that can happen. Single person with a simple change in a config can cause this.

discuss

order

alkonaut|4 years ago

That was the bug.

A trivial example would be a bug that replaces the configuration for all customers with the last uploaded. Then when the next customer uploads a new (valid!) config, you have a problem.

Obviously it wasn’t that trivial but the point is: it wasn’t the customer’s configuration change that was the problem but some code that managed the config change.

nirvanis|4 years ago

It's more common than we imagine. That's usually the start of many of the historical network incidents. The important part, as usual, is to make sure the remediations of such incidents focus on how to limit blast radius of small changes, and how to accomplish that without imposing artificial gatekeeping and bureaucracy into the change process.

KirillPanov|4 years ago

Welcome to the CentralizedWeb (tm). Unfortunately we had to sunset the Internet you used to use.

ianlevesque|4 years ago

A web filled with DDOS attacks and scraping is a web that needs cloudflare and fastly. I’m not sure how to avoid this sorry state of things.

hulitu|4 years ago

They forgot to test it.

alkonaut|4 years ago

Test “it”? The change in question wasn’t by fastly but a customer of theirs making a config change. It’s possible that this customer did validate their change somehow.

Fastly obviously didn’t test their code (with the bug) enough, but testing of course can never prove the absence of bugs. Testing for a global deployment like a massive CDN happens to a large extent in prod because you don’t have another globe. You can test on a smaller scale but eventually you run into a problem that only shows itself at full scale.

2rsf|4 years ago

Testing is never complete nor can it be theoretically complete