A colleague of mine just came bursting through my office door in a panic, thinking he brought our site down since this happened just as he made some changes to our Cloudflare config. He was pretty relieved to see this post.
You joke and I think its funny, but as a junior engineer I would be quite proud if some small change I made was able to take down the mighty Cloudflare.
It's also what was the cause of the Azure Front Doors global outage two weeks ago - https://aka.ms/air/YKYN-BWZ
"A specific sequence of customer configuration changes, performed across two different control plane build versions, resulted in incompatible customer configuration metadata being generated. These customer configuration changes themselves were valid and non-malicious – however they produced metadata that, when deployed to edge site servers, exposed a latent bug in the data plane. This incompatibility triggered a crash during asynchronous processing within the data plane service. This defect escaped detection due to a gap in our pre-production validation, since not all features are validated across different control plane build versions."
> May 12, we began a software deployment that introduced a bug that could be triggered by a specific customer configuration under specific circumstances.
I'd love to know more about what those specific circumstances were!
I'm pretty sure I crashed Gmail using something weird in its filters. It was a few years ago. Every time I did something specific (I don't remember what), it would freeze and then display a 502 error for a while.
What’s funny is as I get older this feeling of relief turns more like a feeling of dread. The nice thing about problems that you cause is that you have considerable autonomy to fix them. Cloudflare goes down you’re sitting and waiting for a 3 party to fix something.
The problem is, I still get the wrong end of the stick when AWS or CF go down! Management doesn't care, understandably. They just want the money to keep coming in. It's hard to convince them that this is a pretty big problem. The only thing that will calm them down a bit is to tell them Twitter is also down. If that doesn't get them, I say ChatGPT is also down. Now NOBODY will get any work done! lol.
When I'm debugging something, I'm not usually looking for the solution to the problem; I'm looking for sufficient evidence that I didn't cause the problem. Once I have that, the velocity at which I work slows down
Maybe this isn’t great, but I get a hint of that feeling when I’m on an airplane and hear a baby crying. For a number of years, if I heard a baby crying, it was probably my baby and I had to deal with it. But now my kids are past that phase, so when I hear the crying, after that initial jolt of panic I realize that it isn’t my problem, and that does give me the warm fuzzies. Even though I do feel bad for the baby and their parents.
I woke up getting bombarded by multiple clients messages of sites not working, I shitted my pants because I've changed the config just yesterday. When I saw the status message "cloudflare down" I was so relieved.
Good that he worked it out so quick. I recently spent a day debugging email problems on Railway PaaS, because they silently closed an SMTP port without telling anyone.
You missed a great opportunity to dead-pan him with something like "No, Bob, not just our site, you brought down the entire Internet, look at this post!"
arbuge|3 months ago
mlrtime|3 months ago
sakisv|3 months ago
https://www.fastly.com/blog/summary-of-june-8-outage
nevf1|3 months ago
"A specific sequence of customer configuration changes, performed across two different control plane build versions, resulted in incompatible customer configuration metadata being generated. These customer configuration changes themselves were valid and non-malicious – however they produced metadata that, when deployed to edge site servers, exposed a latent bug in the data plane. This incompatibility triggered a crash during asynchronous processing within the data plane service. This defect escaped detection due to a gap in our pre-production validation, since not all features are validated across different control plane build versions."
itzjacki|3 months ago
srmarm|3 months ago
I'd love to know more about what those specific circumstances were!
Bloomy22|3 months ago
CableNinja|3 months ago
Freak_NL|3 months ago
spamizbad|3 months ago
jspash|3 months ago
shortrounddev2|3 months ago
jpmonette|3 months ago
mcphage|3 months ago
bookofjoe|3 months ago
unknown|3 months ago
[deleted]
Rooster61|3 months ago
You gain relief, but you don't exactly derive pleasure as it's someone you know that's getting the ass end of the deal
unknown|3 months ago
[deleted]
stonecharioteer|3 months ago
unknown|3 months ago
[deleted]
StanAngeloff|3 months ago
cromka|3 months ago
hoistbypetard|3 months ago
nrhrjrjrjtntbt|3 months ago
sefke|3 months ago
disconnection|3 months ago
bamboozled|3 months ago
itzjacki|3 months ago
0xblinq|3 months ago
ants_everywhere|3 months ago
When aliens study humans from this period, their book of fairy tales will include several where a terrible evil was triggered by a config push.
0xblinq|3 months ago
dcjdfvk|3 months ago
carlos_rpn|3 months ago
belter|3 months ago
raxxorraxor|3 months ago
theoldgreybeard|3 months ago
unknown|3 months ago
[deleted]
NitpickLawyer|3 months ago