top | item 45225687

(no title)

rhet0rica | 5 months ago

See quectophoton's comment—the requirement that continuation bytes are always tagged with a leading 10 is useful if a parser is jumping in at a random offset—or, more commonly, if the text stream gets fragmented. This was actually a major concern when UTF-8 was devised in the early 90s, as transmission was much less reliable than it is today.

discuss

order

rhet0rica|5 months ago

Addendum: This was posted to the front page today: https://doc.cat-v.org/bell_labs/utf-8_history

It also notes that UTF-8 protects against the dangers of NUL and '/' appearing in filenames, which would kill C strings and DOS path handling, respectively.