top | item 36233948

(no title)

Taywee | 2 years ago

It sort of did, but in a completely different place past the critique section:

> But, rather than using them and needing to praying to the heaven’s the internal Multibyte C Encoding is UTF-8 (like with the aforementioned wcrtomb -> mbrtoc8/16/32 style of conversions), we’ll just provide a direction conversion routine and cut out the wchar_t encoding/multibyte encoding middle man.

Not sure why it wasn't mentioned up top. When trying to convert between UTF-8 and UTF-16 without doing it myself or pulling in external dependencies, this was the most annoying thing that slapped me in the face. This is the problem that makes reliable charset conversions between specific encodings actually impossible using just the stdlib functions.

discuss

cryptonector|2 years ago

Standards-wise the only answer to this is to deprecate all non-UTF-8 locales and leave non-UTF-8 codesets outside the scope of C.

Basically, non-Unicode needs to always be at the edge, while in the middle everything needs to be Unicode.

From an application perspective it's easy: document that it only works in UTF-8 locales. Really, that is my position for my software. Anything else is ETOOHARD.

Taywee|2 years ago

I just want reliable conversions. In my situation (duct taping a very old service to a newer one), I needed to read structured files with UTF-16 fields, and process them into an eventual UTF-8 file written to a different location. The host this needed to run on did not have any unicode locales installed (and incidentally, I hate changing locales for my software because it's a program-global switch to flip, and most of my program still wants to run in the user's locale).

I found it ridiculous that there was no way to just convert UTF-16 to UTF-8 without either reinventing that wheel, pulling in an external dependency, or changing global state and having the right system locales installed (as well as knowing the name of at least one of those locales, and guessing a language along with it), despite having the latest C and C++ compilers at my disposal.