top | item 9616748

(no title)

SimonSapin | 10 years ago

That’s roughly how UTF-8 works, with some tweaks to make it self-synchronizing. (That is, you can jump to the middle of a stream and find the next code point by looking at no more than 4 bytes.)

As to running out of code points, we’re limited by UTF-16 (up to U+10FFFF). Both UTF-32 and UTF-8 unchanged could go up to 32 bits.

discuss

order

No comments yet.