top | item 39220528

Thanksgiving 2023 security incident

643 points| nomaxx117 | 2 years ago |blog.cloudflare.com | reply

322 comments

order
[+] BytesAndGears|2 years ago|reply
Writeups and actions like this from cloudflare are exactly why I trust them with my data and my business.

Yes, they aren’t perfect. They do some things that I disagree with.

But overall they prove themselves worthy of my trust, specifically because of the engineering mindset that the company shares, and how serious they take things like this.

Thank you for the blog post!

[+] nimbius|2 years ago|reply
Then, the advertisement worked.

- Insist that you have better integrity than your competitors

- share a few operational investigations after your latest security event

what cloudflare doesnt do is provide their SOC risk analysis as a PCI/DSS payment card processor. Cloudflare doesnt explain why they ignored/failed to identify the elevated accounts or how those accounts became compromised to begin with. They just explain remediation without accountability.

They mention a third-party audit was conducted, but thats not because they care about you. Its because PCI/DSS mandates when an organization of any level experiences a data breach or cyber-attack that compromises payment card information, it needs to pass a yearly on-premise audit to ensure PCI compliance. if they didnt, major credit houses would stop processing their payments.

[+] Xeyz0r|2 years ago|reply
Nobody is perfect, but Cloudflare indeed inspires confidence. Especially thanks to cases where they don't hesitate to talk about the issue and how they resolved it. It's precisely these descriptions of such situations that demonstrate their ability to overcome any challenges.
[+] overstay8930|2 years ago|reply
We're one of their larger enterprise customers and stuff like this makes it easy to get renewals approved easily, keeping engineers in the loop makes it such an easy sell.
[+] zelon88|2 years ago|reply
You actually believe that an intruder gained access to their KB/Tickets and didn't manage to get valuable information? That's not what Jira is for. If you know what Jira is for, and you're willing to run it on-prem, then you know the purpose of doing all that work is because you have something valuable to store in there.

I don't believe they didn't lose anything. That's not how this works, and most Jira/Confluence I've seen is loaded with secrets.

[+] encom|2 years ago|reply
Better hope you stay on their good side, and don't say anything their CEO doesn't approve of.
[+] kccqzy|2 years ago|reply
> Analyzing the wiki pages they accessed, bug database issues, and source code repositories, it appears they were looking for information about the architecture, security, and management of our global network; no doubt with an eye on gaining a deeper foothold.

For a nation state actor, the easiest way to accomplish that is to send one of their loyal citizens to become an employee of the target company and then have the person send back "information about the architecture, security, and management" of the target company.

Fun (but possibly apocryphal) fact: more than a decade ago in a social gathering of SREs at Google, several admitted to being on the payroll of some national intelligence bureaus.

[+] neilv|2 years ago|reply
> Fun (but possibly apocryphal) fact: more than a decade ago in a social gathering of SREs at Google, several admitted to being on the payroll of some national intelligence bureaus.

They had government engagements with Google's consent, and all those various engagements could be disclosed to each other?

If not, what kind of drugs were flowing at this social gathering, to cause such an orgy of bad OPSEC?

[+] _kb|2 years ago|reply
Payroll? You guys are getting paid?

Australians get the 'opportunity' to be part of that sort of that sort of espionage as a base level condition of citizenship [0].

As an upside, I guess it helps with encouraging good practices around zero trust processes and systems dev.

[0]: https://en.wikipedia.org/wiki/Mass_surveillance_in_Australia...

[+] owlstuffing|2 years ago|reply
> For a nation state actor, the easiest way to accomplish that is to send one of their loyal citizens to become an employee of the target company

Precisely. Particularly in the case of US businesses. Why bother picking a lock when you have both the key and permission?

[+] toyg|2 years ago|reply
Not if such citizens are sanctioned. Code Red. Hint hint.
[+] marcinzm|2 years ago|reply
> we were (for the second time) the victim of a compromise of Okta’s systems

I'm curious if they're rethinking being on Okta.

[+] twisteriffic|2 years ago|reply
This wasn't really an additional failure at Okta. This was credentials lost during the original Okta compromise that CloudFlare failed to rotate out.

Okta deserves criticism for their failure, but this feels like CloudFlare punching down to shift blame for a miss on their part.

[+] BytesAndGears|2 years ago|reply
My company will only give us new laptops that are preinstalled with Okta’s management system.

I am grandfathered in to an old MacBook that has absolutely no management software on it, from the “Early Days” when there was no IT and we just got brand new untouched laptops.

They offered me an upgrade to an M1/M2 pro, but I refused, saying that I wasn’t willing to use Okta’s login system if I have my own personal passwords or keys anywhere on my work computer.

Since that would hugely disrupt my work, I can’t upgrade. Maybe I can use incidents like this to justify my beliefs to the IT department…

[+] Icathian|2 years ago|reply
The challenge being, who else could possibly handle Cloudflare's requirements? I imagine the next step is to build their own, and that's obviously not an easy pill to swallow.
[+] OJFord|2 years ago|reply
> The one service token and three accounts were not rotated because mistakenly it was believed they were unused.

Eh? So why weren't they revoked entirely? I'm sure something's just unsaid there, or lost in communication or something, but as written that doesn't really make sense to me?

[+] crdrost|2 years ago|reply
I would assume that "believed" is not meant to be interpreted in an active personal sense but in a passive configuration sense.

That is, I'd expect there was a flag in a database somewhere saying that those service accounts were "abandoned" or "cleaned up" or some other non-active status, but that this assertion was incorrect. Then they probably rotated all the passwords for active accounts, but skipped the inactive ones.

Speaking purely about PKI and certificate revocation, because that's the only similar context that I really know about, there is generally a difference between allowing certificates to expire, vs allowing them to be marked as "no longer used", vs fully revoking them: a certificate authority needs to do absolutely nothing in the first case, can choose to either do nothing or revoke in the second case, and must actively maintain and broadcast that revocation list for the third case. When someone says "hey I accidentally clobbered that private key can I please have a new cert for this new key," you generally don't add the old cert to the revocation list because why would you.

[+] htrp|2 years ago|reply
blameless post mortem most likely

Great call out too

> Note that this was in no way an error on the part of AWS, Moveworks or Smartsheet. These were merely credentials which we failed to rotate.

[+] stepupmakeup|2 years ago|reply
Rotating could have been manual and the person in charge wanted to save time. Stress could be a factor too.
[+] phyzome|2 years ago|reply
Betting they have a new item in their compromise runbook. :-)
[+] sevg|2 years ago|reply
> Even though we believed, and later confirmed, the attacker had limited access, we undertook a comprehensive effort to rotate every production credential (more than 5,000 individual credentials), physically segment test and staging systems, performed forensic triages on 4,893 systems, reimaged and rebooted every machine in our global network including all the systems the threat actor accessed and all Atlassian products (Jira, Confluence, and Bitbucket).

> The threat actor also attempted to access a console server in our new, and not yet in production, data center in São Paulo. All attempts to gain access were unsuccessful. To ensure these systems are 100% secure, equipment in the Brazil data center was returned to the manufacturers. The manufacturers’ forensic teams examined all of our systems to ensure that no access or persistence was gained. Nothing was found, but we replaced the hardware anyway.

They didn't have to go this far. It would have been really easy not to. But they did and I think that's worthy of kudos.

[+] barkingcat|2 years ago|reply
I think they did have to do that far though.

Getting in at the "ground floor" of a new datacentre build is pretty much the ultimate exploit. Imagine getting in at the centre of a new Meet-Me room (https://en.wikipedia.org/wiki/Meet-me_room) and having persistent access to key switches there.

Cloudflare datacentres tend to be at the hub of insane amounts of data traffic. The fact that the attacker knew how valuable a "pre-production" data centre is means that cloudflare probably realized themselves that it would be a 100% game over if someone managed to get a foot hold there before the regular security systems are set up. It would be a company ending event if someone managed to install themselves inside a data centre while it was being built/brought up.

Also remember, at the beginning of data centre builds, all switches/equipment have default / blank root passwords (admin/admin), and all switch/equipment firmware are old and full of exploits (you either go into each one and update the firmware one by one or hook them up to automation for fleet wide patching) Imagine that this exploit is taking place before automation services had a chance to patch all the firmware ... that's a "return all devices to make sure the manufacturer ships us something new" event.

[+] readyplayernull|2 years ago|reply
> The manufacturers’ forensic teams examined all of our systems to ensure that no access or persistence was gained. Nothing was found, but we replaced the hardware anyway.

Aha, the old replace-your-trusted-hardware trick.

[+] schainks|2 years ago|reply
Having seen the small number of DEFCON talks that I've seen, I would have absolutely gone that far.
[+] ldoughty|2 years ago|reply
The nuclear response to compromise should be the standard business practice. It should be exceptional to deviate from it.

If you assume that they only accessed what you can prove they accessed, you've left a hole for them to live in. It should require a quorum of people to say you DON'T need to do this.

Of course, this is ideal world. I'm glad my group is afforded the time to implement features with no direct monetary or user benefit.

[+] sebmellen|2 years ago|reply
The most surprising part of this is that Cloudflare uses BitBucket.
[+] mmaunder|2 years ago|reply
Thing about a data breach is once the data is out there - source code in this case - it’s out there for good and you have absolutely no control over who gets it. You can do as much post incident hardening as you want, and talk about it as much as you want, but the thing you’re trying to protect against, and blogging about how good you’re getting at preventing, has already happened. Can’t unscramble those eggs.
[+] muzso|2 years ago|reply
> The threat actor searched the wiki for things like remote access, secret, client-secret, openconnect, cloudflared, and token. They accessed 36 Jira tickets (out of a total of 2,059,357 tickets) and 202 wiki pages (out of a total of 14,099 pages).

In Atlassian's Confluence even the built-in Apache Lucene search engine can leak sensitive information and this kind of access (to the info by the attacker) can be very hard to track/identify. They don't have to open a Confluence page if the sensitive information is already shown on the search results page.

[+] fierro|2 years ago|reply
>The one service token and three accounts were not rotated because mistakenly it was believed they were unused.

This odd to me - unused credentials should probably be deleted, not rotated.

[+] londons_explore|2 years ago|reply
So after the Okta incident they rotated the leaked credentials...

But I think they should have put honeypots on them, and then waited to see what attackers did. Honeypots discourage the attackers from continuing for fear of being discovered too.

[+] wepple|2 years ago|reply
They mention Zero Trust, yet you can gain access to applications with just a single bearer token?

Am I missing something here?

There’s no machine cert used? AuthN tokens aren’t cryptographically bound?

This doesn’t meet my definition of ZT, it seems more like “we don’t have a VPN”

[+] this_steve_j|2 years ago|reply
This is an excellent report, and congratulations are due to the security teams at CS for a quick detection, response and investigation.

It also highlights the need for a faster move in the entire industry away from long-lived service account credentials (access tokens) and toward federated workload identity systems like OpenId connect in the software supply chain.

These tokens too often provide elevated privileges in devops tools while bypassing MFA, and in many cases are rotated yearly. Github [1], Gitlab, and AZDO now support OIDC, so update your service connections now!

Note: I’m not familiar with this incident and don’t know whether that is precisely what happened here or if OIdC would have prevented the attack.

Devsecops and Zero Trust are often-abused buzzwords, but the principles are mature and can significantly reduce blast radius.

[1] https://docs.github.com/en/actions/deployment/security-harde...

[+] jrockway|2 years ago|reply
Which "nation state" do we think this was?
[+] meowface|2 years ago|reply
For these kinds of attacks it's nearly always China, Russia, US, or sometimes Iran. 95% chance it's either China or Russia, here.
[+] jedahan|2 years ago|reply
The writeup contains indicators, including IP addresses, and the location of those addresses. In this case, the IP address associated with the threat actor is currently located in Bucharest, Romania.
[+] lijok|2 years ago|reply
Which nation state has good enough employment protection laws that they can take weekends off while doing recon on a top value target?
[+] orenlindsey|2 years ago|reply
Cloudflare being compromised would be enormous. Something between 5 and 25% of all sites use CF in some fashion. An attacker could literally hold the internet hostage.
[+] zelon88|2 years ago|reply
What I don't understand is how they got access to Jira yet you still insist there was no compromise.

The very nature of Jira and Confluence (both terrible products, btw) is to collect documentation. I'm assuming it was an internal Jira/Confluence for engineering teams, but still. There have got to be addresses, passwords, service account info, all kinds of info. If it was a tech support server then it's impossible to assert that you didn't lose customer data.

So we have this double standard where you pay for this product that is designed to house your deepest secrets and most cherished organizational information, that's so important to you that you run on premises servers to keep it safe, but it's not important enough to constitute a real "beach".

You're lying. Either the server contained junk of no value in which case it wouldn't have existed in the first place, or you actually did lose something of value that you won't identify to us. Nobody sets up on-prem Jira just to leave it empty and never put secrets in it.

[+] londons_explore|2 years ago|reply
Am I the only one who just sees a totally blank page?

Viewing the HTML shows it's got an empty body tag, and a single script in the <head> with a URL of https://static.cloudflareinsights.com/beacon.min.js/v84a3a40...

[+] chankstein38|2 years ago|reply
No, that's also what I see. I'm not sure why you're getting downvoted.

EDIT: re-opened the link a few minutes later and now I see the post

[+] j-rom|2 years ago|reply
> To ensure these systems are 100% secure, equipment in the Brazil data center was returned to the manufacturers. The manufacturers’ forensic teams examined all of our systems to ensure that no access or persistence was gained. Nothing was found, but we replaced the hardware anyway.

The thoroughness is pretty amazing

[+] wowmuchhack|2 years ago|reply
Such a beautiful report and beautiful ownage.

Whenever some shitty Australian telco gets owned, people are angry and call them incompetent and idiots; it's nice to see Cloudflare gets owned in style with much more class and expertise.

Like the rest of the HN crowd, this incident has only increased my trust in Cloudflare.

[+] jshier|2 years ago|reply
Fascinating and thorough analysis! I guess if you think an account is unused, just delete it!
[+] phyzome|2 years ago|reply
Probably safer to rotate the credentials and then schedule it for deletion later. Then if you discover it wasn't unused after all, you have an easier recovery... :-)