top | item 41830717

CRLF is obsolete and should be abolished

422 points| km | 1 year ago |fossil-scm.org | reply

264 comments

order
[+] michaelmior|1 year ago|reply
> various protocols (HTTP, SMTP, CSV) still "require" CRLF at the end of each line

What would be the benefit to updating legacy protocols to just use NL? You save a handful of bits at the expense of a lot of potential bugs. HTTP/1(.1) is mostly replaced by HTTP/2 and later by now anyway.

Sure, it makes sense not to require CRLF with any new protocols, but it doesn't seem worth updating legacy things.

> Even if an established protocol (HTTP, SMTP, CSV, FTP) technically requires CRLF as a line ending, do not comply.

I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

[+] FiloSottile|1 year ago|reply
Exactly. Please DO NOT mess with protocols, especially legacy critical protocols based on in-band signaling.

HTTP/1.1 was regrettably but irreversibly designed with security-critical parser alignment requirements. If two implementations disagree on whether `A:B\nC:D` contains a value for C, you can build a request smuggling gadget, leading to significant attacks. We live in a post-Postel world, only ever generate and accept CRLF in protocols that specify it, however legacy and nonsensical it might be.

(I am a massive, massive SQLite fan, but this is giving me pause about using other software by the same author, at least when networks are involved.)

[+] amluto|1 year ago|reply
> I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

It’s worse than satire. Postel’s Law is definitively wrong, at least in the context of network protocols, and delimiters, especially, MUST be precise. See, for example:

https://www.postfix.org/smtp-smuggling.html

Send exactly what the spec requires, and parse exactly as the spec requires. Do not accept garbage. And LF, where CRLF is specified, is garbage.

[+] halter73|1 year ago|reply
> I'm hoping this is satire.

Me too. It's one thing to accept single LFs in protocols that expect CRLF, but sending single LFs is a bridge to far in my opinion. I'm really surprised most of the other replies to your comment currently seem to unironically support not complying with well-established protocol specifications under the misguided notion that it will somehow make things "simpler" or "easier" for developers.

I work on Kestrel which is an HTTP server for ASP.NET Core. Kestrel didn't support LF without a CR in HTTP/1.1 request headers until .NET 7 [1]. Thankfully, I'm unaware of any widely used HTTP client that even supports sending HTTP/1.1 requests without CRLF header endings, but we did eventually get reports of custom clients that used only LFs to terminate headers.

I admit that we should have recognized a single LF as a line terminator instead of just CRLF from the beginning like the spec suggests, but people using just LF instead of CRLF in their custom clients certainly did not make things any simpler or easier for me as an HTTP server developer. Initially, we wanted to be as strict as possible when parsing request headers to avoid possible HTTP request smuggling attacks. I don't think allowing LF termination really allows for smuggling, but it is something we had to consider.

I do not support even adding the option to terminate HTTP/1.1 request/response headers with single LFs in HttpClient/Kestrel. That's just asking for problems because it's so uncommon. There are clients and servers out there that will reject headers with single LFs while they all support CRLF. And if HTTP/1.1 is still being used in 2050 (which seems like a safe bet), I guarantee most clients and servers will still use CRLF header endings. Having multiple ways to represent the exact same thing does not make a protocol simpler or easier.

[1]: https://github.com/dotnet/aspnetcore/pull/43202

[+] inopinatus|1 year ago|reply
Not just potential bugs, there'll be definite security failures.

Changing the line endings can invalidate signatures over plaintext content. So an email MTA, for example, could never do so. Nor most proxy implementations. Then there's the high latent potential for request smuggling, command injection, and privilege escalation, via careful crafting of ambiguous header lines or protocol commands that target less robust implementations. With some protocols, it may cause declared content sizes to be incorrect, leading to bizarre hangs, which is to say, another attack surface.

In practice, retiring CRLF can't be safely performed unilaterally or by fiat, we'll need to devise a whole new handshake to affirm that both ends are on the same page re. newline semantics.

[+] mechanicalpulse|1 year ago|reply
> Why intentionally introduce potential bugs for the sake of making a point?

It seems spiteful, but it strikes me as an interesting illustration of how the robustness principle could be hacked to force change. It’s a descriptivist versus prescriptivist view of standards, which is not how we typically view standards.

[+] jcul|1 year ago|reply
Not disagreeing with you, but implementation diverges from spec a lot anyway.

I've had to write decoders for things like HTTP, SMTP, SIP (VoIP), and there's so many edge cases and undocumented behavior from different implementations that you have to still support.

I find that it affects text based protocols, a lot more than binary protocols. Like TLS, or RTP, to stick with the examples above, have much less divergence and are much less forgiving to broken (according to spec) implementations.

[+] chasil|1 year ago|reply
FYI, Sendmail accepts LF without CR, but Exchange doesn't.
[+] javajosh|1 year ago|reply
>What would be the benefit...

It is interesting that you ignore the benefits the OP describes and instead present a vague and fearful characterization of the costs. Your reaction lies at the heart of cargo-culting, the maintenance of previous decisions out of sheer dread. One can do a cost-benefit analysis and decide what to do, or you can let your emotions decide. I suggest that the world is better off with the former approach. To wit, the OP notes for benefits " The extra CR serves no useful purpose. It is just a needless complication, a vexation to programmers, and a waste of bandwidth." and a mitigation of the costs "You need to search really, really hard to find a device or application that actually interprets U+000a as a true linefeed." You ignore both the benefits assertion and cost mitigating assertion entirely, which is strong evidence for your emotionality.

[+] cassepipe|1 year ago|reply
It seems to me the author is not suggesting to update the protocols themselves but rather to stop sending them CR even if the spec requires it. And to patch the corresponding software to it accepts simple newlines.
[+] phkahler|1 year ago|reply
>> I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

It's not satire and it's not just trying to make a point. It's trying to make things simpler. As he says, a lot of software will accept input without the CR already, even if it's supposed to be there. But we should change the standard over time so people in 2050 can stop writing code that's more complicated (by needing to eat CR) or inserts extra characters. And never mind the 2050 part, just do it today.

[+] SQLite|1 year ago|reply
Author here:

My title was imprecise and unclear. I didn't mean that you should raise errors if CRLF is used as a line terminator in (for example) HTTP, only that a bare NL should be allowed as an acceptable line terminator. RFC2616 recommends as much (section 19.3 paragraph 3) but doesn't require it. The text of my proposal does say that CRLF should continue to be accepted, for backwards compatibility, just not required and not generated by default. I failed to make that point clear.

My initial experiments suggested that this idea would work fine and that few people would even notice. Initially, it appeared that when systems only generate NL instead of CRLF, everything would just keep working seamlessly and without problems. But, alas, there are more systems in circulation that are unable to deal with bare NLs than I knew. And I didn't sell my idea very well. So there was breakage and push-back.

I have revised the document accordingly and reverted the various systems that I control to generate CRLFs again. The revolution is over. Our grandchildren will have to continue dealing with CRLFs, it seems. Bummer.

Thanks to everyone who participated in my experiment. I'm sorry it didn't work out.

[+] perching_aix|1 year ago|reply
Well, at least the title is honest. Straight up asking people to break standards out of sheer conviction is a new one for me personally, but it's definitely one of the attitudes of all time, so maybe it's just me being green.

Can we ask for the typical *nix text editors to disobey the POSIX standard of a text file next, so that I don't need to use hex editing to get trailing newlines off the end of files?

[+] bityard|1 year ago|reply
Why would you want that?

All Unix text processing tools assume that every line in a text file ends in a newline. Otherwise, it's not a text file.

There's no such thing as a "trailing newline," there is only a line-terminating newline.

I've yet to hear a convincing argument why the last line should be an exception to that extremely long-standing and well understood convention.

[+] 201984|1 year ago|reply
What's wrong with trailing newlines?
[+] bigstrat2003|1 year ago|reply
Yeah, I have no idea what the author is smoking. Deliberately breaking standards is simply not an acceptable solution to the problem, even if it were a serious problem (it's not).
[+] moomin|1 year ago|reply
Counterpoint: Unix deciding on a non-standard line ending was always a mistake. It has produced decades of random incompatibility for no particular benefit. CRLF isn’t a convention: it’s two different pieces of the base terminal API. You have no idea how many programs rely on CR and LF working correctly.
[+] fanf2|1 year ago|reply
It is a standard line ending. ANSI X3.4-1968 says:

10 LF (Line Feed). A format effector that advances the active position to the same character position on the next line. (Also applicable to display devices.) Where appropriate, this character may have the meaning “New Line” (NL), a format effector that advances the active position to the first character position on the next line. Use of the NL convention requires agreement between sender and recipient of data.

ASCII 1968 - https://www.rfc-editor.org/info/rfc20

ASCII 1977 - https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub1-2-197...

[+] eqvinox|1 year ago|reply
Counter-counterpoint: using 2 bytes to signal one relevant operation creates ambiguity out of thin air. If all you care about is "where does the line end?", having CRLF as a line ending creates edge cases for "there is only CR" and "there is only LF". Are those line endings or not? How do you deal with them? And what's LFCR?

Personally speaking, I've always written my parsers to be permissive and accept either CR¹, LF, or CRLF as line endings. And it always meant keeping a little extra boolean for "previous byte was CR" to ignore the LF to not turn CRLF into 2 line endings.

¹ CR-only was used on some ancient (m68k era?) Macintosh computers I believe.

P.S.: LFCR is 2 line endings in my parsers :D

[+] matheusmoreira|1 year ago|reply
Yeah. It's weird how Unix picked LF given its love of terminals. CRLF is the semantically correct line ending considering terminal semantics. It's present in the terminal subsystem to this day, people just don't notice because they have OPOST output post processing enabled which automatically converts LF into CRLF.
[+] bmitc|1 year ago|reply
I have always felt that somehow Linux and proponents of it default to every decision it made being right and everything else, namely Windows, being wrong. I honestly feel Linux is orders of magnitude more complex. It is much easier, in my experience to make software just work on Windows. (This is not to say Windows doesn't have bad decisions. It has many. All the major OSs are terrible.)
[+] globular-toast|1 year ago|reply
But, like the article says, LF is not useful. I could always interpret LF as NL and if you send CRs too it won't break anything. If you know I'm interpreting LF that way you can just stop sending the CRs. That's what happened in Unix.
[+] rgmerk|1 year ago|reply
Of all the stupid and obsolete things in standards we use to interoperate, CRLF is one of the least consequential.
[+] fweimer|1 year ago|reply
SMTP <https://datatracker.ietf.org/doc/html/rfc2821#section-4.1.1....> is pretty clear that the message termination sequence is CR LF . CR LF, not LF . LF, and disagreements in this spot are known to cause problems (include undesirable message injection). But then enough alternative implementations that recognize LF . LF as well are out there, so maybe the original SMTP rules do not matter anymore.
[+] mvdtnz|1 year ago|reply
[flagged]
[+] ripe|1 year ago|reply
Ha, ha, ha! I love it. I believe the author is serious, and I think he's on to something.

OP clearly says that most things in fact don't break if you just don't comply with the CRLF requirement in the standard and send only LF. (He calls LF "newline". OK, fine, his reasoning seems legit.) He is not advocating changing the language of the standard.

To all those people complaining that this is a minor matter and the wrong hill to die on, I say this: most programmers today are blindly depending on third-party libraries that are full of these kinds of workarounds for ancient, weird vestigial crud, so they might think this is an inconsequential thing. But if you're from the school of pure, simple code like the SQLite/Fossil/TCL developers, then you're writing the whole stack from scratch, and these things become very, very important.

Let me ask you instead: why do you care if somebody doesn't comply with the standard? The author's suggestion doesn't affect you in any way, since you'll just be using some third-party library and won't even know that anything is different.

Oh bUT thE sTandArDs.

[+] anonymousiam|1 year ago|reply
This article seems like it was written to troll people into a flame war. There is no such character as NL, and the article does not at all address that fact that the "ENTER" key on every keyboard sends a CR and not a LF. Things work fine the way they are.
[+] TacticalCoder|1 year ago|reply
> There is no such character as NL ...

More specifically the Unicode control character U+000a is, in the Unicode standard, named both LF and NL (and that comes from ASCII but in ASCII I think 0x0a was only called LF).

It literally has both names in Unicode: but LINEFEED is written in uppercase while newline is written in lowercase (not kidding you). You can all see for yourself that U+000a has both names (and eol too):

https://www.unicode.org/charts/PDF/U0000.pdf

> and the article does not at all address that fact that the "ENTER" key on every keyboard sends a CR and not a LF.

what a key on a keyboard sends doesn't matter though. What matters is what gets written to files / what is sent over the wire.

    ... $  cat > /tmp/anonymousiam<ENTER>
    <ENTER>
    <CTRL-C>

    ... $  hexdump /tmp/anonymousiam
    00000000  000a
When I hit ENTER at my Linux terminal above, it's LINEFEED that gets written to the file. Under Windows I take it the same still gets CRLF written to the file as in the Microsoft OSes of yore (?).

> Things work fine the way they are.

I agree

[+] eviks|1 year ago|reply
> There is no such character as NL,

There is, copying from a helpful comment above:

> The Unicode standard does call it NL along with LF.

    000A  <control>
      = LINE FEED (LF)
      = new line (NL)
      = end of line (EOL)
Source: https://www.unicode.org/charts/PDF/U0000.pdf

And things don't work fine, there are many issues with this historical baggage

[+] o11c|1 year ago|reply
U+0085 is sometimes called NL (it is the standard in EBCDIC), but more often NEL in the ASCII world.
[+] deltaknight|1 year ago|reply
As an implementation detail, I assume many programs simply ignore the CR character already? Whilst of course many windows programs (and protocols as mentioned) still require CRLF, surely the most efficient way to make something cross-platform if to simply act on the LF part of CRLF, that way it works for both CRLF and LF line ends.

The fact that both CRLF and LF used the same control character in my eyes in a huge bonus for this type of action to actually work. Simply make everything cross platform and start ignoring CR completely. I’m surprised this isn’t mentioned explicitly as a course of action in the article, instead it focuses on making people change their understanding of LF in to NL which is as unnecessary complication that will cause inevitable bikeshedding around this idea.

[+] phkahler|1 year ago|reply
>> instead it focuses on making people change their understanding of LF in to NL which is as unnecessary complication that will cause inevitable bikeshedding around this idea.

Not really. In order to ignore CR you need to treat LF as NL.

[+] zac23or|1 year ago|reply
> Even if an established protocol (HTTP, SMTP, CSV, FTP) technically requires CRLF as a line ending, do not comply. Send only NL.

Insane. First i think it was a April 1st joke, but is not.

Let's break everything because YES.

[+] sunk1st|1 year ago|reply
> Nobody ever wants to be in the middle of a line, then move down to the next line and continue writing in the next column from where you left off. No real-world program ever wants to do that.

Is this true?

[+] anamax|1 year ago|reply
No, it's not true.

It was used for "graphics" on character-only terminals.

[+] numpad0|1 year ago|reply
isn't CR without LF how CLI progress bars work?
[+] ericyd|1 year ago|reply
I'm not trying to be obtuse but I am actually confused how a modern machine correctly interprets CRLF based on the description in this post.

If a modern machine interprets LF as a newline, and the cursor is moved to the left of the current row before the newline is issued, wouldn't that add a newline _before_ the current line, i.e. a newline before the left most character of the current line? Obviously this isn't how it works but I don't understand why not.

[+] shadowgovt|1 year ago|reply
Define "abolish."

We could certainly try to write no new software that uses them.

But last I checked, there are terabytes and terabytes of stored data in various formats (to say nothing of living protocols already deployed) and they aren't gonna stop using CRLF any time soon.

[+] eviks|1 year ago|reply
Is defined in 4 points at the end
[+] justin66|1 year ago|reply
Nice. I think that's the most energized I've seen Richard Hipp on a topic.
[+] zulu-inuoe|1 year ago|reply
Of all the hills to die on. What an unbelivably silly one. CRLF sucks, suck it up. As many others have noted, there are millions of devices this idea puts in jeopardy for absolutely no reason. We should be reducing the exceptions, not creating them
[+] fortran77|1 year ago|reply
The article had some major gaffes. Teletypes never had a ball. The stationary platen models had type boxes and cylinders, but never balls.
[+] srg0|1 year ago|reply
I would also like to point out that English spelling is obsolete and should be abolished (/s). The text of the CRLF abolition proposal itself contains more digraphs, trigraphs, diphthongs, and silent letters than line-ending sequences. The last letter of the word "obsolete" is not necessary. "Should" can be written as only three letters in Shavian "𐑖𐑫𐑛".

According to ChatGPT, the original proposal had:

Number of sentences: 60 Number of diphthongs: 128 (pairs of vowels in the same syllable like "ai", "ea", etc.) Number of digraphs: 225 (pairs of letters representing a single sound, like "th", "ch", etc.) Number of trigraphs: 1 (three-letter combinations representing a single sound, like "sch") Number of silent letters: 15 (common silent letter patterns like "kn", "mb", etc.)

For all intents and purposes, CRLF is just another digraph.

[+] ksp-atlas|1 year ago|reply
I'm a big fan of English spelling reform and know Shavian and sometimes write in it, but I feel shavian is limited due to how heavily it uses letter rotation. Dyslexics already have trouble with b, d, p and q, having most letters have a rotated form would be challenging