(no title)
Taywee | 2 years ago
> But, rather than using them and needing to praying to the heaven’s the internal Multibyte C Encoding is UTF-8 (like with the aforementioned wcrtomb -> mbrtoc8/16/32 style of conversions), we’ll just provide a direction conversion routine and cut out the wchar_t encoding/multibyte encoding middle man.
Not sure why it wasn't mentioned up top. When trying to convert between UTF-8 and UTF-16 without doing it myself or pulling in external dependencies, this was the most annoying thing that slapped me in the face. This is the problem that makes reliable charset conversions between specific encodings actually impossible using just the stdlib functions.
cryptonector|2 years ago
Basically, non-Unicode needs to always be at the edge, while in the middle everything needs to be Unicode.
From an application perspective it's easy: document that it only works in UTF-8 locales. Really, that is my position for my software. Anything else is ETOOHARD.
Taywee|2 years ago
I found it ridiculous that there was no way to just convert UTF-16 to UTF-8 without either reinventing that wheel, pulling in an external dependency, or changing global state and having the right system locales installed (as well as knowing the name of at least one of those locales, and guessing a language along with it), despite having the latest C and C++ compilers at my disposal.