One thing to keep in mind when judging what's 'appropriate' is that Cloudflare was effectively responding to an ongoing security incident outside of their control (the React Server RCE vulnerability). Part of Cloudlfare's value proposition is being quick to react to such threats. That changes the equation a bit: any hour you wait longer to deploy, your customers are actively getting hacked through a known high-severity vulnerability.In this case it's not just a matter of 'hold back for another day to make sure it's done right', like when adding a new feature to a normal SaaS application. In Cloudflare's case moving slower also comes with a real cost.
That isn't to say it didn't work out badly this time, just that the calculation is a bit different.
flaminHotSpeedo|2 months ago
However, this preliminary report doesn't really justify the decision to use the same deployment system responsible for the 11/18 outage. Deployment safety should have been the focus of this report, not the technical details. My question that I want answered isn't "are there bugs in Cloudflare's systems" it's "has Cloudflare learned from it's recent mistakes to respond appropriately to events"
vlovich123|2 months ago
There’s no other deployment system available. There’s a single system for config deployment and it’s all that was available as they haven’t yet done the progressive roll out implementation yet.
dkyc|2 months ago
Particularly if we're asking them to be careful & deliberate about deployments, hard to ask them fast-track this.
Already__Taken|2 months ago
flaminHotSpeedo|2 months ago
ascorbic|2 months ago
Disclosure: I work at Cloudflare, but not on the WAF
cowsandmilk|2 months ago
udev4096|2 months ago
[deleted]
toomuchtodo|2 months ago