(no title)
excessive | 6 years ago
No, and I explicitly mentioned UTF-8. My suggestion is that str holds arbitrary immutable binary data and that you have a method which can interrogate whether that binary data is valid UTF-8.
Yes, real world text is messy and there are lots of encodings, compression schemes, and exceptions (UTF-8 with byte order marks, overlong encodings, or surrogate pairs, as examples). If your main task is converting text between outdated or broken encodings, I don't have any problem saying you need a separate library and shouldn't burden the rest of the user base. Despite it's flaws, the majority of the world has settled on Unicode with a UTF-8 encoding.
"Special cases aren't special enough to break the rules."
No comments yet.