top | item 44835207

HTTP Is Not Simply

11 points| bigblind | 6 months ago |daniel.haxx.se

5 comments

Interesting and valid points.

The complexity issues seem to happen with every networking protocol that I've seen "grow up", even those designed explicitly for simplicity, like TFTP. The fabled "xmodem" protocol is a great example, starting as a ridiculously naive call and response, sprouting some error correction (xmodem CRC), then getting improvements, morphing into "ymodem" and "zmodem". Is this a modality for software in general, or just for "feral" software, where the spec or source code escapes into the wild, then lots of people "port" it, "improve" it or otherwise tinker with it, and there's some kind of fitness function that determines survival?

colejohnson66|6 months ago

A big problem is that text-based protocols are hard. People think they're "simple" but they're not — it's a lie. Text is one of the hardest things to get right, but Eurocentrism (read: ASCII) leads many to write bad parsers.

For starters, there's typically no strict delineations. For HTTP, people see "ends in a new line" and forget to consume the CR or even send it — because CR is a "Windowsism" or whatever. Then people need to modify their software to accept buggy transmissions, and it snowballs.

Take HTML, for example. It's a mess of hacks to parse it[0] because programs 20 years ago took shortcuts. Or they prefered to show something to the user instead of failing (remember XHTML?), so they massage the input to work. We even have an <image> tag that is an alias for <img>.[1] Those shortcuts make bad content "work" accidentally, so people start depending on them.

Or INI files. A nice key-value structure delimited by line endings. Except now we need sections, so we have `[x]` lines. And don't forget the LF/CR-LF problem when splitting on the line endings! And now people want arrays, so we bolt them on with TOML and the funky `[[x]]` syntax.

Text-based parsers are decievingly hard, but programmers don't want to admit it. They're easy to read, sure, but parser-mismatch vulnerabilities[2] will come back to bite you eventually.

That's not to say "binary" formats are easy — just that they have a rigid structure that tends to blow up on failure instead of silently succeeding.

[0]: https://html.spec.whatwg.org/multipage/parsing.html

[1]: https://html.spec.whatwg.org/multipage/parsing.html#parsing-...

[2]: https://www.joyk.com/dig/detail/1703448744811381

miggy|6 months ago

HTTP has never been truly simple, but HTTP/1.1 is probably the simplest it gets. HTTP/2 and HTTP/3 are significantly more complex.