Given the long history of request parsing vulnerabilities in HTTP/1.1 servers and proxies, is HTTP/2 actually worse, or have most of the HTTP/1.1 bugs just been fixed already?
These vulnerabilities are all from badly-written HTTP/2 → HTTP/1.1 translations. Most of them come from simple carelessness, rookie errors that should never have been made, dumping untrusted bytes from an HTTP/2 value into the HTTP/1.1 byte stream. This is security 101, straightforward injection attacks with absolutely nothing HTTP-specific in it.
Some of them are a little more complex, requiring actual HTTP/2 and HTTP/1.1 knowledge (largely meaning HTTP/2 framing and the content-length and transfer-encoding headers), but not most of them.
Is HTTP/2 actually worse? Not in the slightest; HTTP/1.1 is the problem here. This is growing pains from compatibility measures as part of removing the problems of an unstructured text protocol. If you have a pure-HTTP/2 system and don’t ever do the downgrade, you’re in a better position.
I'd agree that HTTP/1 deserves a significant portion of the blame.
On the other hand, one maxim I've learned from my time bug hunting is that nobody ever validates strings in binary protocols. As such, I'm utterly unsurprised there are so many implementations with these kinds of bugs, and I'd say they could have been predicted in advance.
In fact… let's see… yep, they were predicted. Some of them, at least. In the HTTP/2 RFC, under Security Considerations, 10.3 'Intermediary Encapsulation Attacks' describes one of the attack classes from the blog post, the one involving stuffing newlines into header names.
Does that mean something could have been done about it? Perhaps not. The ideal solution would be to somehow design the HTTP/2 protocol itself to be resistant to misimplementation, but that seems pretty much impossible. The spec already bans colons and newlines in header names, but there's no way to be sure implementations won't allow them anyway, short of actually making them a delimiter like HTTP/1 did – in other words, reverting to a text-based protocol. But a text-based protocol would come with its own share of misimplementation risks, the same ones that HTTP/1 has.
On the other hand, perhaps the bug classes could have been mitigated if someone designed test cases to trigger them, and either included them in conformance tests (apparently there was an official HTTP/2 test suite [2] though it doesn't seem to have been very popular), or set up some kind of bot to try them on the entire web. In principle you could blame the authors of HTTP/2 collectively for the fact that nobody did this. But I admit that's pretty handwavey.
It's tough to say that something is a "rookie error" when basically every serious professional team makes the same mistake. This broke apparently broke every AWS ALB, for instance.
I would bet that a lot of these are not rookie errors, they are more akin to Spectre or Meltdown: inherently unsafe code that was considered a valuable risk for performance.
In general, when writing a high performance middle box, you want to touch the data as little as possible: ideally, the CPU wouldn't even see most of the bytes in the message, they would just be DMA'd from the external NIC to the internal NIC. This is probably not doable for HTTP2->HTTP1, but the general principle applies. In high-performance code, you don't want to go matching strings any more than you think is strictly necessary (e.g. matching the host or path to know where to actually send the packet).
Which is not to say that it wasn't a mistake to assume you can get away with this trade-off. But it's not a rookie error.
Is an new bucket leaking in a dozen places worse than an old one with all leaks fixed? I would say yes until those holes in the new one are also fixed.
When I implemented an HTTP2 server several years ago it was all of the "fun" of HTTP 1.1 parsing and semantics plus the extra challenges of the HTTP2 optimizations such as HPACK, mapping the abbreviated headers to cache in-memory representations, stream management, and if you supported Push Promises then that too.
chrismorgan|4 years ago
Some of them are a little more complex, requiring actual HTTP/2 and HTTP/1.1 knowledge (largely meaning HTTP/2 framing and the content-length and transfer-encoding headers), but not most of them.
Is HTTP/2 actually worse? Not in the slightest; HTTP/1.1 is the problem here. This is growing pains from compatibility measures as part of removing the problems of an unstructured text protocol. If you have a pure-HTTP/2 system and don’t ever do the downgrade, you’re in a better position.
comex|4 years ago
On the other hand, one maxim I've learned from my time bug hunting is that nobody ever validates strings in binary protocols. As such, I'm utterly unsurprised there are so many implementations with these kinds of bugs, and I'd say they could have been predicted in advance.
In fact… let's see… yep, they were predicted. Some of them, at least. In the HTTP/2 RFC, under Security Considerations, 10.3 'Intermediary Encapsulation Attacks' describes one of the attack classes from the blog post, the one involving stuffing newlines into header names.
Does that mean something could have been done about it? Perhaps not. The ideal solution would be to somehow design the HTTP/2 protocol itself to be resistant to misimplementation, but that seems pretty much impossible. The spec already bans colons and newlines in header names, but there's no way to be sure implementations won't allow them anyway, short of actually making them a delimiter like HTTP/1 did – in other words, reverting to a text-based protocol. But a text-based protocol would come with its own share of misimplementation risks, the same ones that HTTP/1 has.
On the other hand, perhaps the bug classes could have been mitigated if someone designed test cases to trigger them, and either included them in conformance tests (apparently there was an official HTTP/2 test suite [2] though it doesn't seem to have been very popular), or set up some kind of bot to try them on the entire web. In principle you could blame the authors of HTTP/2 collectively for the fact that nobody did this. But I admit that's pretty handwavey.
[1] https://datatracker.ietf.org/doc/html/rfc7540#section-10.3
[2] https://github.com/http2/http2-test
tptacek|4 years ago
simiones|4 years ago
In general, when writing a high performance middle box, you want to touch the data as little as possible: ideally, the CPU wouldn't even see most of the bytes in the message, they would just be DMA'd from the external NIC to the internal NIC. This is probably not doable for HTTP2->HTTP1, but the general principle applies. In high-performance code, you don't want to go matching strings any more than you think is strictly necessary (e.g. matching the host or path to know where to actually send the packet).
Which is not to say that it wasn't a mistake to assume you can get away with this trade-off. But it's not a rookie error.
josefx|4 years ago
rubiquity|4 years ago