Zoom: Remote Code Execution with XMPP Stanza Smuggling

[+] twoodfin|3 years ago|reply

The XML parsing/validation bugs are, I suppose, not shocking, but deeply disappointing.

The one thing XML & its tooling were supposed to get right was document well-formed-ness. Sure, it might be a mess of a standard in other ways, but at least we could agree what a parser should and shouldn’t accept! (Not the case for the HTML tag soup of then or now.)

That, 25 years on, a popular XML processor can’t even meet that low bar for tag names is maddening.

[+] Diggsey|3 years ago|reply

There are just so many issues here.

1) Don't rely on two parsers having identical behaviour for security. Yes parsers for the same format should behave the same, but bugs happen, so don't design a system where small differences result in such a catastrophic bug. If you absolutely have to do this, at least use the same parser on both ends.

2) Don't allow layering violations. All content of XML documents is required to be valid in the configured character encoding. That means layer 1 of your decoder should be converting a byte stream into a character stream, and layers 2+ should not even have the opportunity to mess up decoding a character. Efficiency is not a justification, because you can use compile-time techniques to generate the exact same code as if you combined all layers into one. This has the added benefit that it removes edge-cases (if there is one place where bytes are decoded into characters, then you can't get a bug where that decoding is only broken in tag names, and so your test coverage is automatically better).

3) Don't transparently download and install stuff without user interaction, regardless of where it comes from!

4) Revoke certificates for old compromised versions of an installer so that downgrade attacks are not possible.

[+] jerf|3 years ago|reply

Unfortunately, the problem here is programmers moreso than formats. It literally doesn't matter what you specify, programmers will not implement it to a T. Most programmers simply don't know that every single detail matters. Many of those who may have some idea don't really care, since they can't imagine how something like this could happen.

It's not just XML. It's every ecosystem I've ever used. Push it around the edges and you will find things.

This is neat, not because it is special to JSON in particular but because it's an example of examining a good chunk of a large ecosystem: https://seriot.ch/projects/parsing_json.html Consider this is likely to be true in any ecosystem that doesn't make it a top priority to avoid.

[+] Flowdalic|3 years ago|reply

It appears that Gloox, a relative low-level XMPP-client C library, rolled much of its Unicode and XML parsing itself, which made such vulnerabilities more likely. There maybe good reasons to not re-use existing modules and rely on external libraries, especially if you target constraint low-end embedded devices, but you should always be aware of the drawbacks. And the Zoom client typically does not run on those.

[+] zamalek|3 years ago|reply

One of the harder things with XMPP is that it is a badly-formed document up until the connection is closed. You need a SAX-style/event-based parser to handle it. That makes rolling your own understandable in some cases (e.g. dotnet's System.Xml couldn't do this prior to XLinq).

That being said, as you indicated Gloox is C-based, and the reference implementation of SAX is in C. There is no excuse.

[+] Aeolun|3 years ago|reply

I find that response a bit strange, since the whole reason the Zoom client has these particular vulnerabilities is because they didn’t roll their own, and instead rely on layers of broken libraries.

It’s quite possible they’d have more bugs without doing that, but re-using existing modules could just as easily have been an even worse idea.

[+] powerapple|3 years ago|reply

IMO we should use external libraries, and should invest engineering time on the library rather than just take a library. Not using good third party library means you need to invest at least a few engineer-month in it to get the same result, and you will need to invest a lot more to do better than third party library. Instead, you can take the library and invest a few engineer month to improve the opensource library.

[+] account42|3 years ago|reply

Why? If anything, the client does the more reasonable interpretation of the XML-in-malformed-UTF-8 - skipping to the next valid UTF-8 sequence start. It's the server that has really weird behavior for their UTF-8 handling where it somehow special cases multi-byte UTF-8 sequences but then does not handle invalid ones.

[+] xxpor|3 years ago|reply

This is a very common issue across all of software engineering I've found. But I really don't get why. If I was given the task of parsing Unicode or XML, I'd run and find a library as fast as possible, because that sounds terrible and tedious, and I'd rather do literally anything else!

Why aren't people more lazy, in other words?

[+] dgellow|3 years ago|reply

Some relevant info in case you don’t want to read the whole description but wonder if you’re concerned by the issue:

> Zoom fixed the server-side issues in February and client-side issues on April 24 in version 5.10.4.

> Zoom published a security bulletin about client-side fixes at https://explore.zoom.us/en/trust/security/security-bulletin

CVE-2022-25235 CVE-2022-25236 Fixed-2022-Apr-24 CVE-2022-22784 CVE-2022-22785 CVE-2022-22786 CVE-2022-22787

[+] kevincox|3 years ago|reply

This is another lesson that you should always parse+serialize rather that just validate. It is much harder to smuggle data this way to exploit different parsers.

Basically the set of all messages that will satisfy your validator is far larger than the set of all messages that will be produced by your serializer.

[+] fsflover|3 years ago|reply

Or, it's another lesson that you should not completely trust any code but compartmentalize instead. Thanks to Qubes OS, I am still safe, since Zoom is running in a hardware-virtualized VM.

[+] lovasoa|3 years ago|reply

I am not sure this applies in this case. I don't know how Zoom's XMPP backend works, but it could very well parse and serialize and still be vulnerable. If the xml library accepts invalid 3-byte utf8 characters on parse, then its internal representation supports these characters, and I don't see why they would not be serialized just as well.

[+] ifratric1|3 years ago|reply

XMPP servers (including Zoom's) already parse + serialize ;)

[+] bobbylarrybobby|3 years ago|reply

Having multiple, potentially different parsers is incredibly dangerous. One person used the fact that different plist parsers in the macOS kernel choked in different ways when interpreting malformed xml, leading some to believe the plist was "safe" because it did not grant certain permissions, while others trusted this "safe" plist but believed it did grant these permissions.

https://blog.siguza.net/psychicpaper/

[+] dqv|3 years ago|reply

I didn’t even consider the existence of XMPP vulns until I listened to the Darknet Diaries episode about Kik[0]. It’s a really interesting class of vulnerabilities.

[0]: https://darknetdiaries.com/episode/93/

[+] robertlagrant|3 years ago|reply

This vuln writeup is extremely well written. Actually quite interesting to read!

[+] rektide|3 years ago|reply

How much of Zoom is powered by XMPP? Do we know much about these internals? This would be super cool to learn about.

[+] henearkr|3 years ago|reply

Good thing that I never used the standalone client and always the in-browser webapp instead.

[+] user23894295637|3 years ago|reply

How do you do that? On any OS I tried (Debian, Windows) it always *forces* me to download the standalone client, otherwise I can't join. There's no alternative link ("Join via web") like MS Teams has for example.

I really feel uncomfortable each time I have to install the client on a machine for my relatives :/

[+] 0daystock|3 years ago|reply

Unfortunately they don't allow you to both speak and present using the webapp - forcing desktop client use.

[+] thinkmassive|3 years ago|reply

Heh, it’s like an AIM punter, but better!

[+] pabs3|3 years ago|reply

Are these issues bugs in libxml, gloox, ejabberd? Or just in the Zoom client and server?

[+] jeffbee|3 years ago|reply

At some point we are going to need enforceable professional standards that effectively deal with commercial software publishers who choose to parse untrusted inputs in non-performance-sensitive contexts with C libraries.

[+] turminal|3 years ago|reply

This bug has nothing to do with language choice.

I agree that better professional standards and accountability should be introduced for software like zoom though.

90 comments