top | item 41781042

(no title)

flareback | 1 year ago

He gave 4 examples of how it's done incorrectly, but zero actual examples of doing it correctly.

discuss

order

TheGeminon|1 year ago

> Okay, so those are the problems. What’s the solution?

> If you need to perform a case mapping on a string, you can use LCMap­String­Ex with LCMAP_LOWERCASE or LCMAP_UPPERCASE, possibly with other flags like LCMAP_LINGUISTIC_CASING. If you use the International Components for Unicode (ICU) library, you can use u_strToUpper and u_strToLower.

crote|1 year ago

The correct thing to do is to not do it at all. If text is 3rd-party supplied, treat it like an opaque byte sequence. Alternatively, pay a well-trained human to do it by hand.

All other options are going to result in edge cases where you're not handling it properly. It's like trying to programmatically split a full name into a first name and a last name: language doesn't work like that.

commandlinefan|1 year ago

    for (int i = 0; i < strlen(s); i++) {
        s[i] ^= 0x20;
    }

calibas|1 year ago

Thank you for this universal approach. I can now toggle capitalization on/off for any character, instead of just being limited to alphabetic ones!

Jokes aside, I was kinda hoping for a good answer that doesn't rely on a Windows API or an external library, but I'm not sure there is one. It's a rather complex problem when you account for more than just ASCII and the English language.

vardump|1 year ago

Surely you meant:

  s[i] &= ~0x20;
We're talking about converting to upper case after all! As an added benefit, every space character (0x20) is now a NUL byte!