(no title)
nephrite | 4 years ago
Prefix code: The first byte indicates the number of bytes in the sequence. Reading from a stream can instantaneously decode each individual fully received sequence, without first having to wait for either the first byte of a next sequence or an end-of-stream indication. The length of multi-byte sequences is easily determined by humans as it is simply the number of high-order 1s in the leading byte. An incorrect character will not be decoded if a stream ends mid-sequence.
https://en.wikipedia.org/wiki/UTF-8#Comparison_with_other_en...
pierrebai|4 years ago
Also, the human readability sounds fishy. Humans are really bad at decoding high-order bits. For example can you tell the length of a UTF-8 sequence that would begin with 0xEC at a glance? With my scheme, either the high bit is not set (0x7F or less), which is easy to see you only need to compare the first digit to 7. Or the high bit is set and the high nibble is less than 0xC, meaning there is another byte, also easy to see, you compare the first digit to C.
The quote also implicitly mis-characterized the fact that in my scheme an incorrect character would also not be decoded if interrupted since it would lack the terminating flag (No byte > 0xC0).
account42|4 years ago