top | item 37632650

(no title)

ormax3 | 2 years ago

Like explained in https://utf8everywhere.org/#windows , you can write simple wrapper functions `narrow`/`widen` used when you are about to call window api functions.

    ::SetWindowTextW(widen(someStdString).c_str());

Implementation is straightforward, relying on `WideCharToMultiByte`/`MultiByteToWideChar` to do the conversion:

https://github.com/neacsum/utf8/blob/master/src/utf8.cpp

discuss

mkoubaa|2 years ago

I have good experiences with this approach. It isolates the weirdness of windows and let's the rest of your code do things idiomatically

eco|2 years ago

Microsoft added a setting in Windows 10 to switch the code page over to utf-8 and then in Windows 11 they made it on by default. Individual applications can turn it on for themselves, so they don't need to rely on the system setting being checked.

I haven't tried it yet but with that you can just use the -A variants of the winapi with utf-8 strings. No conversion necessary.

adzm|2 years ago

Do you have any references about it being enabled by default in Windows 11? I've seen conflicting reports and often seems to vary depending on the system locale whether it gets enabled or disabled by default.

huhtenberg|2 years ago

You are missing OP's point - this still costs you 2 extra calls.

If this cost really matters (and practically speaking it never does), then, as the other commenter said, the correct solution is to just use OS-native encoding for all file system paths and names used by the program, hidden behind an abstraction layer if needs be. UTF16 for Windows, UTF8 elsewhere.

ormax3|2 years ago

The above manifesto makes the argument to use UTF-8 *everywhere*, even on windows where the internal representation is not native utf8.

The conversion overhead is really negligible: https://utf8everywhere.org/#faq.cvt.perf

(note: the two api calls per conversion is because how those specific functions work, first call to get the size to allocate, second to do the actual conversion, but you can always use another library in the implementation for the utf8<->utf16 conversion that might be more optimized than those windows api functions)

ynik|2 years ago

"2 extra calls" is a weird metric here. Some calls are vastly more expensive than others. Syscalls come with a significant cost, encoding conversion of short strings (esp. filenames) does not. Hiding just the syscalls behind an abstraction layer is vastly simpler than doing that and additionally hiding the string representation, so "UTF-8 everywhere" is IMHO the right solution.

Someone1234|2 years ago

I thought the OP's point is there are too many considerations when doing this?

Someone is suggesting a way of making it less tedious, and your response is "performance?!" even though in both scenarios you're running the same code and it is likely the compiler in release would remove the intermediary.