top | item 21422085

(no title)

excessive | 6 years ago

> Your proposal only works well for US ASCII users.

No, and I explicitly mentioned UTF-8. My suggestion is that str holds arbitrary immutable binary data and that you have a method which can interrogate whether that binary data is valid UTF-8.

Yes, real world text is messy and there are lots of encodings, compression schemes, and exceptions (UTF-8 with byte order marks, overlong encodings, or surrogate pairs, as examples). If your main task is converting text between outdated or broken encodings, I don't have any problem saying you need a separate library and shouldn't burden the rest of the user base. Despite it's flaws, the majority of the world has settled on Unicode with a UTF-8 encoding.

"Special cases aren't special enough to break the rules."

discuss

No comments yet.